Linking to Python help from Excel

The py_Numpy spreadsheet presented in the previous post has been updated:

  • A large number of Numpy functions added.
  • Return function help documentation to the spreadsheet for a selected function.
  • View Numpy on-line documentation from the function Wizard.

The new version can be downloaded from:

py_Numpy.zip

As before, the pyxll add-in is required for the connection from Excel to Python.

All the available functions are listed on the first sheet. Enter the index number for a function in cell F4, and the help for that function will be displayed:

The on-line help for all the listed Numpy functions can also be accessed quickly and easily through the Function Wizard.

Select a function and click on the “Insert Function” icon (immediately to the left of the Edit bar):

Then click “Help on this function” in the bottom left corner:

The Numpy on-line help for the selected function is displayed.

Note that this is a work in progress. Connecting to the on-line help requires a different html path to be generated for each function, and the path names are not always consistent. If you find any Numpy functions where the help is not displayed, please let me know.

Posted in Excel, Link to Python, Newton, NumPy and SciPy, PyXLL, UDFs | Tagged , , , , , , | Leave a comment

Calling Numpy polynomial functions from Excel

The Numpy polynomial related function discussed in recent posts can now be downloaded from:

py_Numpy.zip

The download file includes Python code and a sample spreadsheet. Required installed software is Python, Numba and pyxll to call the code from Excel. Some of the functions in the default file require the just-in-time compiler, Numba. For those without Numba installed the file pyNumpy-noJIT.py can be used.

Note that the Quadratic, Cubic and Quartic functions (which are not included in Numpy) are based on C code from the following sources:

The screenshots below show examples of the spreadsheet functions: (click on any image for full-size view).

The py_PolyRoots function calls the general purpose Numpy function, that will work for any degree polynomial. The alternative functions have very much better performance for degrees up to quartic.

Py_PolyFromRoots finds polynomial factors from its roots. Py_PolyVal returns the function value for any value of x, allowing array input.

Py_PolyFit fits a polynomial to input data:

Other Numpy polynomial functions are shown below:

The py_flip functions reverses a Numpy array along a specified axis:

Finally functions are provided to form a Numpy array of complex numbers from an Excel list of pairs of floats:

Posted in Curve fitting, Excel, Link to Python, Maths, NumPy and SciPy, PyXLL, UDFs | Tagged , , , , | 1 Comment

Solving cubic equations – background and timing

For more information on the polynomial functions, including quartic and higher order solvers, see: Solving Quadratic, Cubic, Quartic and higher order equations; examples

Re-reading the Wikipedia article on solving cubic equations, I noticed that the trigonometrical solution for finding 3 real roots seemed a very simple approach, which might be faster than the current code in my current VBA cubic function. (See Cubic Equations). Writing a new CubicT function, I found that this was actually slower than the original function (see more details below), and checking the code I found the original method for finding 3 roots was essentially the same as the Wikipedia method, and the method for 1 real root was simpler. Nonetheless, the code for the new function was better documented than the older code, so I have added the new function and a sheet documenting the method to the download file at:

Polynomial.zip

Times for 100,000 iterations of the new function are shown below, compared with the Cubic function (which returns the same results), CubicC (which also returns complex roots), and the Quartic function:

The VBA version of the CubicT function is more than twice as slow as the original Cubic function, presumably because the routine for finding single real roots is slower.

The code for all four functions was transferred to Python (called from Excel with pyxll), and as is typical with plain (and non-optimised) Python code, the performance was 2-3 times slower.

Modifying the Python code to use the Numba JIT compiler gave a substantial performance improvement, with the Python code now 2-5 times faster than the VBA. The CubicT function with Numba was greatly improved, and was slightly faster than Cubic. The Quartic function had an even greater improvement, and was faster than the CubicC function, and only a little slower than the other two Cubic functions.

Two Python functions also called the Numpy general purpose polynomial routines. These were both very slow, presumably because the solvers are written for polynomials of any degree, and are not optimised for cubic or quartic equations.

Note that the download file includes all the new VBA code, but not the new Python functions. This will be covered in a future post.

Posted in Excel, Link to Python, Maths, Newton, NumPy and SciPy, PyXLL, UDFs, VBA | Tagged , , , , , , , , , | Leave a comment

Working with Python polynomials from Excel

Python polynomial functions have several features that require manipulation of data passed from Excel:

  • In Python polynomial coefficients are listed in order of increasing powers of x (a + bx + cx^2 …), but they are more usually listed with the highest power first, including in my VBA polynomial functions.
  • Polynomial functions may have complex numbers as input, and often have complex output, but Excel complex numbers are limited in functionality and are displayed as text strings, which makes use of the values on the spreadsheet difficult. A more convenient format is to pass complex numbers to and from Python as pairs of floats, or 2D arrays of floats.

I have written 3 short Python functions to deal with passing arrays of complex numbers from and to Excel, using pyxll, or within Python code to convert arrays of pairs of floats to Python complex format.

The py_ReverseA function reverses a 1D or 2D array, and with a 2D array the axis to be reversed may be specified, or by default the longer axis is used:

@xl_func
@xl_arg('array1', 'numpy_array')
@xl_arg('axis', 'int')
@xl_return('numpy_array')
def py_ReverseA(array1, axis = None):
    ndims = len(array1.shape)
    if ndims == 1:
        return array1[::-1]
    else:
        if axis == None:
            axis = 0
            if array1.shape[1] > array1.shape[0]: axis = 1
        if axis == 0:
            return array1[::-1,:]
        else:
            return array1[:, ::-1]

py_FloatA2Complex converts an Excel range of values to a Python complex array. If the Excel range is a single column the values will be converted to complex numbers with an imaginary value of zero. Ranges with 2 columns or 2 rows will have pairs of values converted to Python complex values:

@xl_func
@xl_arg('array1', 'numpy_array', ndim = 2)
@xl_arg('axis', 'int')
@xl_return('var')
def py_FloatA2Complex(array1, axis = None):
    rows, cols = array1.shape
    if axis == None:
        if rows > 2 or rows >= cols:
            axis = 0
        else:
            axis = 1
    if axis == 0:
        complexa = np.zeros(rows, dtype=np.complex128)
        if cols > 1:
            complexa[:] = array1[:, 0] +1j*array1[:, 1]
        else:
            complexa[:] = array1[:, 0] +1j*np.zeros(rows)
    else:
        complexa = np.zeros(cols, dtype=np.complex128)
        if rows > 1:
            complexa[:] = array1[0, :] +1j*array1[1, :]
        else:
            complexa[:] = array1[0, :] +1j*np.zeros(cols)
    return complexa

Note that this function returns an array of complex numbers, which are primarily intended for use by other Python functions. If it is called from Excel it will return a cache object, displaying as “ndarray@18”. This can be returned back to Python, or another Python function (such as py_Complex2FloatA below) can be used to extract the data in a format that can be displayed by Excel.

Finally, py_Complex2FloatA converts a Python array of complex numbers to a 2 column or 2 row range of Excel doubles:

@xl_func
@xl_arg('array1', 'var')
@xl_arg('axis', 'int')
@xl_return('var')
def py_Complex2FloatA(array1, axis = 0):
    rtn = np.array([array1.real, array1.imag])
    if axis == 0: rtn = np.transpose(rtn)
    return rtn 

The screenshot below shows examples of each of these functions:

Posted in Arrays, Excel, Link to Python, PyXLL, UDFs | Tagged , , , , , , , , , | Leave a comment

A Match Function Bug …

… or at least, a non-intuitive feature of the MATCH function, that may give inconsistent results.

This post is based on a discussion at EngTips.

The problem under discussion was to find the number of the first and last rows containing data in a column, when the data was in a single block, with increasing values. The suggested formulas were:

  • Start: =MATCH(TRUE,C6:C157<>0,0)+5
  • End: MATCH(TRUE,C6:C32<>0,1)+5

A simple example is shown below:

Note that the start and end row of the data in column G is reported correctly, and the start row is correct in both cases, but the formula for the end of the data in column C returns the end of the search range, rather than the end of the data.

Looking at the formula in more detail:
MATCH(TRUE,C6:C32<>0,1)
C6:C32<>0 returns an array of TRUE or FALSE values, as shown in columns D and H.

MATCH(TRUE,{ARRAY},1) should return the row number in ARRAY of the last cell containing a TRUE value (which is how it works in column G), but the last argument is the “match type”, with 1 indicating that the last row number with contents less than or equal to the match value should be returned. Because the data is supposed to be increasing in value the function performs a binary search. Starting at the centre of the search range, if this value is blank (and therefore less than the search value), it will check only cells further down the range, and since all these are also blank, it ends up in the last cell of the range, and returns this as the answer.

Two alternative formulas are given are shown in the spreadsheet, which will always give the correct result, provided that the data is continuous and increasing:

  • =MATCH(MAX(C6:C32),C6:C32,0) + 5 (where 5 is the number of the row immediately above the search range.
  • =COUNTA(C6:C32)+C1-1 (where C1 is the row number of the start of the data)

Note that the first formula will give the correct result if the data has gaps, provided that the last cell with data has the maximum value. For the second formula the data does not need to be increasing in value, but it must be continuous.

Posted in Arrays, Excel | Tagged , , , | 3 Comments