Following the previous ExcelPython post I have modified the VBA and Python code for improved performance and to simplify the Python coding. The main change is that where Numpy Arrays are required in the Python code, these are now created in the VBA, as described here (see post number 9 from Eric Reynolds) and here (a more detailed and wide ranging tutorial).
The new spreadsheet and Python Code may be downloaded from MatrixXLpy.zip including full open source code.
I have also updated the performance benchmarks as shown below:
I have now separated out the data transfer times from the actual solution time. It can be seen that when operating on a full dense matrix the xl_Solve function is very much faster than any of the sparse functions, but on the 1000×1000 sparse matrix all of the functions performed in much the same time. With the large (10945×10945) sparse matrix (which is too large for the dense functions to handle) the xl_SpSolveCOO function (using the Scipy Library) was significantly faster than the PySparse based xl_SpSolveLU, but the iterative xl_SpSolveit function (also using the PySparse Library) was 5 times faster again. The PySparse based functions operate on the non-zero elements of the lower triangle of symmetrical matrices, allowing substantially larger problems to be solved than could be handled by the xl_SpSolveCOO function.
Coding for the xl_SpSolveCOO function is shown below:
Function xl_SpSolveCOO(SMatrix As Range, SVect As Range) As Variant Dim Rtn As Variant Rtn = xl_Run2A("xl_spsolvecooL", SMatrix, SVect, modname2, 1) If TypeName(Rtn) = "String" Then xl_SpSolveCOO = Rtn Else xl_SpSolveCOO = WorksheetFunction.Transpose(Rtn) End If End Function
The Python code is called by the xl_Run2A function:
Function xl_Run2A(FuncName As String, VA_1 As Range, VA_2 As Range, Optional Modname As String = modname1, Optional RtnDims As Long = 2) As Variant Dim methods As Variant, result As Variant, VA_1py As Variant, VA_2py As Variant, NumpyArray As Variant, Result_List As Variant On Error GoTo rtnerr Set NumpyArray = PyGet(PyModule("numpy"), "array") Set VA_1py = PyCall(NumpyArray, , PyTuple(VA_1.Value2)) Set VA_2py = PyCall(NumpyArray, , PyTuple(VA_2.Value2)) Set methods = PyModule(Modname, AddPath:=Path1) Set result = PyCall(methods, FuncName, PyTuple(VA_1py, VA_2py)) Set Result_List = PyCall(result, "tolist") xl_Run2A = PyVar(Result_List, RtnDims) Exit Function rtnerr: xl_Run2A = Err.Description End Function
The associated Python code is:
import numpy as np import scipy.sparse as ssp import scipy.sparse.linalg as sspla from pysparse import spmatrix, superlu, precon, itsolvers import linalgfuncs as laf import time def xl_spsolvecooL(m, x): """ Solve a matrix equation Mx = y in sparse COO format, input from array. m: three column array of: row indeces, column indeces, values. x: column x vector """ i, j, v, x = list2ijvx(m, x) n = x.size A = ssp.coo_matrix((v,(i,j)),shape=(n,n)) return xl_spsolvecoo(A, x) def list2ijvx(m, x): """ Convert 3 column array, m, to 3 vectors, and single column array, x, to vector """ v = m[:,2] i = m[:,0] j = m[:,1] x = np.array(x) x = x[:,0] return i, j, v, x def xl_spsolvecoo(A, x): """ Solve a matrix equation Ax = y in sparse COO format. A: Sparse array in COO format. x: column x vector """ A = A.tocsr() A = sspla.spsolve(A, x) return A
Easier to understand and useful. This is one of the nice things a like about ExcelPython. Thanks for sharing.
LikeLike
Thanks for posting the data transfer times.
LikeLike
I was surprised that the actual solver time was such a small proportion of the total, especially for the 1000 x 1000 matrix. On the other hand it’s a bit misleading because in a real application it would make sense to generate the matrix in Python (or whatever language is being used for the solver), rather than transferring the whole thing from VBA.
But it does show that there is real scope for overall speed improvement by focussing on the matrix generation code.
LikeLike
This is why I love engineers. They measure things and look at the results. Wishful thinking won’t keep a bridge from collapsing. Not many professions have such a close connection to the physical world.
LikeLike
I can only agree 🙂
LikeLike
Reblogged this on SutoCom Solutions.
LikeLike
Hello Doug,
the link to the tutorial is now broken unfortunately as I have committed to the repository. You can replace it with the following URL which is kind of like a permalink:
https://github.com/ericremoreynolds/excelpython/blob/c0c4d642dfed45c0e8d10e20d7858ac084c83b42/Tutorial/4.%20Ranges%2C%20VBA%20arrays%20and%20NumPy%20types.md
Regards,
Eric.
LikeLike
Thanks Eric, the link is now updated,
Incidentally, I have just discovered that the major delay in getting the spreadsheet data into the fast solve routines does not occur in the transfer from VBA to Python, the really slow part is converting a long Python list of lists into a numpy array. I’ll post more details when I’ve worked out the best way round it.
LikeLike
That’s odd, I can’t think how else you could build a sparse matrix even in pure Python code. How long are your lists?
LikeLike
The lists I’m working with at the moment are up to 200,000 rows. For generating the sparse matrices the PySparse library has a function to generate a sparse matrix direct from a list of lists, but the Scipy sparse matrix routine needs 3 numpy 1D arrays. The quickest way I have found so far is to copy the three column long array into three separate row arrays with a For loop, then transfer those to Python as 1D lists, which will convert to 1D Numpy arrays quickly. I can generate single column arrays from the 3 column array in one operation in VBA with the Offset function, but these are transferred to Python as a long list of single item lists, which convert to Numpy arrays very slowly. Another option is to use the Worksheetfunction.Transpose function, but that doesn’t work on long arrays.
For my purposes it would be better if single column arrays from VBA were converted to single lists, as for single row arrays.
LikeLike
If I have understood correctly what you’re trying to do you should be able to do that with PyObj(rangeWithOneColumn.Value, 1)
That should generate a 1D list
LikeLike
Thanks Eric. I was using PyVar, rather than PyObj. I’ll have another go and report back.
LikeLike
Eric – yes, PyObj does what I want. The data transfer time is now down to about 5-10% of the total solution time, which is much healthier. I’ll do a post on it in the next few days.
LikeLike
Pingback: Data Transfer to Python – Update | Newton Excel Bach, not (just) an Excel Blog