As noted in the comments here, the main bottleneck in the Python matrix solver functions presented recently was not in the data transfer from Excel, but rather in the creation of the Numpy arrays for very long lists of short lists (see this Stackoverflow thread for more details of the background). It seems there is a substantial time saving in converting the long array into 1D vectors, which can be converted into Numpy arrays very much more quickly. The VBA code below converts a 3D array of any length (up to the maximum allowed in Excel) to 3 vectors.
Function Vectorize(x As Variant, x_1 As Variant, x_2 As Variant, x_3 As Variant) As Long Dim nr As Long nr = x.Rows.Count x_1 = x.Resize(nr, 1).Value2 x_2 = x.Offset(0, 1).Resize(nr, 1).Value2 x_3 = x.Offset(0, 2).Resize(nr, 1).Value2 Vectorize = nr End Function
To transfer these vectors to Python, via ExcelPython, the PyObj function must be used:
Set x = Range("SSA") ' Excel range with 500500 rows and 3 columns n = Vectorize(x, x_1, x_2, x_3) 'Convert range values to 3 vectors ' Create ExcelPython objects for transfer to Python Set x_1 = PyObj(x_1, 1) Set x_2 = PyObj(x_2, 1) Set x_3 = PyObj(x_3, 1)
In Python the three vectors are converted to Numpy arrays:
def xl_getnpvect(x_1, x_2, x_3): timea = np.zeros(4) timea[0] = time.clock() x_1 = np.array(x_1) timea[1] = time.clock() x_2 = np.array(x_2) timea[2] = time.clock() x_3 = np.array(x_3) timea[3] = time.clock() return timea.tolist()
The table below compares the data transfer and conversion times using this method on an Excel range of 500500 rows x 3 columns, with the same operation using a 2D variant array.
The times for solution of a large sparse matrix system (10945×10945 matrix), using the new vector transfer routine, are shown below:
The data transfer and array creation times are now a relatively small proportion of the total solution time, even for the iterative solver with a solve time of only 0.28 seconds.
Your code looks good. Thanks for sharing.
LikeLike
maybe it might be of help to force vba to pass data a ONE-dimensional array (as a list), and then
make the reshape to right rows/columns in numpy (maybe also specifying the type and the
column order).
With a tweak, it is possible to do it WITHOUT copying the range, but simply forcing the array to think
(temporarily) that it is one-dimensional. The following code does just this, and then restores the
array:
Option Explicit
Option Base 1
Declare Sub CopyMemory Lib “kernel32” Alias “RtlMoveMemory” (pvDest As Any, lpvSource As Any, ByVal cbCopy As Long)
Declare Function VarPtrArray Lib “msvbvm60.dll” Alias “VarPtr” (Ptr() As Any) As Long
Private Const VT_BYREF = &H4000&
Public Type SAFEARRAYBOUND
cElements As Long
lLbound As Long
End Type
Public Type SAFEARRAY
cDims As Integer
fFeatures As Integer
cbElements As Long
cLocks As Long
pvData As Long
rgsabound(3) As SAFEARRAYBOUND
End Type
Function Rank(A) As Long
Dim lp As Long, VType As Integer, sa As SAFEARRAY, n As Long
Rank = 0
If Not IsArray(A) Then Exit Function
With sa
CopyMemory VType, A, 2
CopyMemory lp, ByVal VarPtr(A) + 8, 4
If (VType And VT_BYREF) 0 Then
CopyMemory lp, ByVal lp, 4
End If
CopyMemory .cDims, ByVal lp, 2
Rank = .cDims
End With
End Function
Function Make1Dim(A, m As Long)
Dim lp As Long, VType As Integer, sa As SAFEARRAY, n As Long
If Not IsArray(A) Then Exit Function
With sa
CopyMemory VType, A, 2
CopyMemory lp, ByVal VarPtr(A) + 8, 4
If (VType And VT_BYREF) 0 Then
CopyMemory lp, ByVal lp, 4
End If
CopyMemory .cDims, ByVal lp, 16
CopyMemory .rgsabound(1), ByVal lp + 16, 2 * Len(.rgsabound(1))
m = .rgsabound(2).cElements
n = .rgsabound(1).cElements
.rgsabound(1).cElements = m * n
.rgsabound(2).cElements = m * n
.cDims = 1
CopyMemory ByVal lp + 16, .rgsabound(1), 2 * Len(.rgsabound(1))
CopyMemory ByVal lp, .cDims, 16
End With
Make1Dim = A
End Function
Sub reset2Dim(A, m As Long)
Dim lp As Long, n As Long, VType As Integer, sa As SAFEARRAY
If Not IsArray(A) Then Exit Sub
With sa
CopyMemory VType, A, 2
CopyMemory lp, ByVal VarPtr(A) + 8, 4
If (VType And VT_BYREF) 0 Then
CopyMemory lp, ByVal lp, 4
End If
CopyMemory .cDims, ByVal lp, 16
CopyMemory .rgsabound(1), ByVal lp + 16, 2 * Len(.rgsabound(1))
.rgsabound(2).cElements = m
.rgsabound(1).cElements = .rgsabound(1).cElements \ m
.cDims = 2
CopyMemory ByVal lp + 16, .rgsabound(1), 2 * Len(.rgsabound(1))
CopyMemory ByVal lp, .cDims, 16
End With
End Sub
Sub sho(r, A)
Dim n As Integer
n = Rank(A)
Select Case n
Case 0
r = A
Case 1
r.Resize(1, UBound(A, 1)).Value2 = A
Case 2
r.Resize(UBound(A, 1), UBound(A, 2)).Value2 = A
End Select
End Sub
Sub Test()
Dim A, m As Long ‘ m saves the right no. of rows
A = [B3:D6]
sho [B10], Make1Dim(A, m)
reset2Dim A, m
sho [B13], A
End Sub
just put numbers in the range B3:D6, and run Test.
LikeLike
Hi Maurizio, thanks for the suggestion and the code.
I had some problems with the code:
After pasting into the VBE, and replacing the WordPress ” and ‘ with the standard versions, I still had three lines marked as errors:
If (VType And VT_BYREF) 0 Then
I have deleted the 0, and the code now seems to work. Does that make sense?
I’m still not sure that I see the advantage of this approach though. With my code I get three 1D lists transferred to Python, which is quite convenient. I guess with a large rectangular array your way would be better though. Are there other advantages that I’m missing?
LikeLike
the original line had “If (VType And VT_BYREF) 0 Then” (i.e., not equal to zero), so your
change is equivalent. In reality, I think that in the present context it’s even safe to delete those
three lines. Regarding your question:
1) as you guess, for a large dense matrix definitively there should be a speed advantage. In python it is quite easy to restore the dense matrix, something like (on the premise that i’m NOT a python expert)
x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
A=np.reshape(asfortranarray(x,dtype=’f8′),(3,-1),order=’F’)
(it makes a 3 x 2 matrix).
2) Similarly, if you need three vectors, i think you can get them in the python side, and possibly the whole process might be faster than the three calls;
3) This approach can give some speed advantage in any vba code, in cases where it is appropriate to process a matrix in vba with a single loop and a single index (add a scalar to the whole matrix, etc.) Something that it was already possible in the old vb6, just blanking the check on the array indexes and using just one index, but that is impossible in vba.
Is it worth? difficult to answer, maybe it is an option to have just in case.
A final comment on your interface to ExcelPython: it is perfect for spreadsheets formulas, as it works on the sequence
range -> variant -> py object -> python processing -> py object -> variant
but this is overkill for vba programmnig, where i’d like to have also internal routines like
py object -> python processing -> py object
in vba, just use
dim A
set A = ….. (py call)
and then the object A is alive and it can be used as such in other vba/ExcelPython processing. I think this “double” interface might increase the appeal of the whole thing.
Best regards, maurizio
LikeLike