As reported in the previous post in this series, although the compiled routines included in packages such as Numpy, Scipy, and Pysparse are very fast, other parts of the code turned out to be slower than the equivalent in VBA. Some experimentation with the frame analysis code has found two simple strategies that can substantially improve this situation:
- Where the code is performing any function that is included in any available compiled library, use the compiled version.
- Any block of code that is called repeatedly can be converted to use a “just in time” compiler, such as Numba.
In the case of the frame analysis program, the following code is called for each frame member in assembling the global stiffness matrix:
for i in range(0 , 12): for j in range(0 , 12): sum = 0.0 for k in range(0 , 12): sum = sum + tt[i, k] * cc[k, j] km[i, j] = sum
This code is performing matrix multiplication, and can be replace with a single line calling the Numpy dot function:
km = npy.dot(tt, cc)
This single change reduced the time to generate the solver input data from about 30 seconds to 6 seconds (the run time for the modified code was reduced by a factor of about 100). Some further modification of the Python code brought this down to about 4 seconds, which was faster than the VBA code, but was still longer than the run time for the solver, which was doing all the hard work, so there was clearly room for further improvement.
This was achieved using Numba, which to quote their web site is:
a just-in-time specializing compiler which compiles annotated Python and NumPy code to LLVM (through decorators). Its goal is to seamlessly integrate with the Python scientific software stack and produce optimized native code, as well as integrate with native foreign languages.
Implementing the Numba routines proved to be surprisingly easy; an example is shown below:
from numba import double, jit, autojit ... fastkva = autojit(py_formkv) lastk = fastkva(km, g, lastk, kva) def py_formkv(km, G, k, kva): idof = km.shape[0] for i in range(0,idof): m = G[i] if m != 0: for j in range(0,idof): n = G[j] if n != 0: K = km[i,j] if K != 0: icd = n - m if icd >= 0: kva[k,2] = K #km[i,j] kva[k,0] = n kva[k,1] = m k = k+1 return k
The function py_formkv is standard Python. The function fastkva is defined as a Numba compiled version of py_formkv, and this may then be called in the same way as any other function. This single call to Numba reduced the run time from 4 seconds to less than 1 second. Further improvements to the data generation time would no doubt be possible, but I will now be concentrating on the post-processing, currently all done in VBA, which takes about 5 seconds.
See more about Numba at: Numba vs. Cython: Take 2, where the author (who unlike me is an experienced Python programmer) has compared Numba to several alternatives, and found it to be the fastest (with a 1400 x speed-up, compared with the original Python code!).