Python for VBA users – 4; Python data types

Previous Python Post

The basic range of data types in Python is quite simple, but by the time we have added additional types used in the numpy package, and translation of Excel data types into the Python equivalent the picture gets considerably more complicated, and you can find yourself wasting significant amounts of time chasing down errors due to using incompatible data types.  This post will look at the main data types, including numpy arrays, with examples of their use in PyXll functions.

The Python code and associated Excel spreadsheet can be downloaded from pyTypes.zip.  To use the Python functions it is necessary to install the PyXll add-in, then either add “pyTypes” under modules in the pyxll.cfg file, or simply copy and paste all the code from pyTypes.py into the worksheetfuncs.py file, to be found in the PyXll\examples folder.

The PyXll manual lists the following PyXll data types, with their Python equivalent:

PyXll and Python Data Types

PyXll and Python Data Types

Note that the unicode type is only available from Excel 2007.

An Excel range of any of the above data types can be passed as a 2D array, which is treated in Python as a “list of lists”.  In addition the PyXll var type can be used for any of the above types, or a range.  A further option for transferring an Excel range  of numerical data is use of the numpy_array, numpy_row and numpy_column types.  Further options (which we will examine in a later post) are Custom Types and the xl_cell type.

The function var_pyxll_function_1 shown below (taken from the PyXll worksheetfuncs.py file) returns the name of the data type passed to the function. The var_pyxll_function_1a function simply returns the data back to the spreadsheet.

@xl_func("var x: string")
def var_pyxll_function_1(x):
    """takes an float, bool, string, None or array"""
    # we'll return the type of the object passed to us, pyxll
    # will then convert that to a string when it's returned to
    # excel.
    return type(x)

@xl_func("var x: var")
def var_pyxll_function_1a(x):
    return x

Examples of these functions are shown in the screenshot below:

Python basic data types

Python basic data types

The var_pyxll_function_1 function passes data to Python as a var, and returns a string with the Python data type name.  Note that:

  • All numbers (including integers) become Python floats.
  • Empty cells become Python NoneType
  • Strings (including complex number strings) become Python Unicode (in Excel 2007 and later).
  • Boolean values become Python bool.
  • Excel error values become different types of Python exceptions.
  • Both 1D and 2D Excel ranges become Python lists.  In fact in both cases the range becomes a “list of lists”.

The var_pyxll_function_1a function passes data to Python as a var, and simply returns the same data, without modification:

  • All data types other than a blank, including 1D and 2D ranges and error values, are passed back unchanged, however blank cells are passed back as a zero.

Further details of the handling of data are examined in four variants of a function converting an Excel complex number string to a pair of floating point values in adjacent cells:

@xl_func("var x: var")
def py_Complex1(x):
    """
    Convert a complex number string to a pair of floats.
    x:  Complex number string or float.
    """
    if isinstance(x, float) == False:
        if x[-1] == "i":
            x = x.replace("i", "j",1)
    c = complex(x)
    clist = list([[0.,0.]])
    clist[0][0] = c.real
    clist[0][1] = c.imag
    return clist

The code first checks if the function argument is a float.  If not it checks if the last character is an “i” (the default Excel imaginary number indicator), and if so changes it to a “j” (the required character for a Python complex number).  The code then:

  • Converts the Excel complex number string to a Python complex number.
  • Creates a python “list of lists” consisting of a single list containing two floats: [0., 0.]
  • Allocates the real and imaginary parts of the complex number to the list.  Note that the index values are always base zero, and that two index values are required; the first indicating list index 0 (i.e. the first and only list in the list of lists) and the second indicating the index value in the selected list.
  • The resulting list is then returned with the statement: return clist

Note the use of text within triple quotes at the top of the function.  This text will appear in the “Excel function wizard”, including the description of each function argument.  This feature will be examined in more detail in a later post.

The results of this function are shown below:

Results from py_Complex1 and py_Complex2

Results from py_Complex1 and py_Complex2

Note that the complex number string in cell A27 is returned as a pair of values, and the other cells with numerical values are correctly returned with the cell value as the real part, and an imaginary value of zero.  The blank and Boolean values return a #VALUE! error, as does the string in cell A31, because it has spaces around the “+”, which are not allowed in a complex number string.

The function py_Complex2 is similar, except that the return type has been changed from a var to a numpy_row:

@xl_func("var x: numpy_row")
def py_Complex2(x):
    """
    Convert a complex number string to a pair of floats.
    x:  Complex number string or float.
    """
    crow = zeros(2)
    if isinstance(x, float) == False:
        if x[-1] == "i":
            x = x.replace("i", "j",1)
    c = complex(x)
    crow[0] = c.real
    crow[1] = c.imag
    return crow

A numpy_row is a 1D array created with the statement: crow = zeros(x), where x is the number of elements in the array.  Since it is a 1D array each element is specified with a single index, e.g. crow[0] = c.real.

The results for py_Complex2 are the same as py_Complex1, except that invalid input returns a #NUM! error, rather than #VALUE!

In py_Complex3 a check has been added to pick up invalid input data, so a more useful error message can be returned:

@xl_func("var x: var")
def py_Complex3(x):
    """
    Convert a complex number string to a pair of floats.
    x:  Complex number string or float.
    """
    try:
        crow = zeros(2)
        if isinstance(x, float) == False:
            if x[-1] == "i":
                x = x.replace("i", "j",1)
        c = complex(x)
        crow[0] = c.real
        crow[1] = c.imag
        return array([crow])
    except:
        return "Invalid input type"

In this case the return array has been created as a 1D numpy-row, using crow = zeros(2), but the return value may also be the string “Invalid input type”.  To handle these alternative data types the return value must be specified as a var, and the numpy_row is converted into a 2D array with: return array([crow]).

Note the use of the Python “try:, except:” keywords to handle the exception raised when the statement: c = complex(x) is applied to an x that is not a valid complex number string or a float.

py_Complex3 and py_Complex4

py_Complex3 and py_Complex4

In py_Complex4 the check of the input data type has been amended to first check for a float or a Boolean (in which case the value is returned), then check for a string.  A blank cell does not satisfy any of these conditions, and returns a value of zero.  The numpy_row return type has been used, which precludes returning a string, but this could be changed to a var, as in py_Complex3, to allow an error message to be returned for invalid string input.

@xl_func("var x: numpy_row")
def py_Complex4(x):
    """
    Convert a complex number string to a pair of floats.
    x:  Complex number string or float.
    """
    crow = zeros(2)
    if isinstance(x, float) or isinstance(x, bool):
        crow[0] = x
    elif isinstance(x, unicode):
        if x[-1] == "i":
            x = x.replace("i", "j",1)
        c = complex(x)
        crow[0] = c.real
        crow[1] = c.imag
    else:
        crow[0] = 0
    return crow

The use of different types of array is illustrated with three variants of a function to sum the values in each row of a 2D range, and return the results as a column array.

@xl_func("float[] x: float[]")
def SumRows1(x):
    """
    Sum rows of selected range.
    x:  Range to sum.
    """
    nrows = len(x)
    i = 0
    suml = [0.] * nrows
    for row in x:
       for j in range(0,3):
            suml[i] = suml[i] + row[j]
       i = i+1
    return transpose([suml])

@xl_func("var x: var")
def SumRows2(x):
    """
    Sum rows of selected range.
    x:  Range to sum.
    """
    nrows = len(x)
    i = 0
    suml = [0.] * nrows
    for row in x:
       for j in range(0,3):
            suml[i] = suml[i] + row[j]
       i = i+1
    return transpose([suml])

@xl_func("numpy_array x: numpy_column")
def SumRows3(x):
    """
    Sum rows of selected range.
    x:  Range to sum.
    """
    nrows = len(x)
    i = 0
    suml = zeros(nrows)
    for row in x:
       for col in row:
            suml[i] = suml[i] + col
       i = i+1
    return suml

This code uses two variants of a for loop:

  • for row in x: loops three times to return a list of 3 values for each of the three rows in x
  • for j in range(0, 3): loops three times returning the values 0, 1, 2
  • the j value is then used as the index in row[j] to return each value in each row

The row sum values are assigned to the list suml, which is finally returned as a 2D array with 3 rows and 1 column with the line: return transpose([suml])

SumRows1 and SumRows2 are similar, except that SumRows2 uses vars for both input and output rather than float arrays (float[]).

In SumRows3 the input is a numpy_array, and output a numpy_column.  suml is created as a 1D numpy array, using the zeros function, and because the return type has been declared as a numpy_column this can be returned without transposing.

The screenshot below shows that when the input array has numbers in every cell them the three functions produce identical results:

SumRows functions

SumRows functions

However if one more cells are blank SumRows2 returns an error, whreas SumRows1 and SumRows2 allocate a zero to the blank cell, and return the correct results.

Finally the convenience of use of the numpy routines is illustrated with a function to solve systems of linear equations with a single line of Python code:

@xl_func("numpy_array a, numpy_column x: numpy_column")
def py_solve(a, x):
	"""
        Solve a matrix equation ax = y.
	a: square matrix
	x: column x vector
	"""
	return solve(a, x)
py_solve

py_solve

Posted in Arrays, Excel, Link to Python, NumPy and SciPy, UDFs, VBA | Tagged , , , , , , , | 5 Comments

Python for VBA users – 3

In the previous post in this series I looked at translating a routine to solve quadratic equations from VBA to Python.  Using the same process I have also created Python code to solve Cubic equations.  The only added difficulty for the Cubic routine was that the VBA function included a call to a VBA function to sort the results in ascending order.  It would have been quite easy to translate the VBA routine, but since Python already includes a sort routine it was easier to use this, so the VBA function call:

BubbleSort CubicA, 1, 3

became in Python:

CubicA[0:3,0:1] = sort(CubicA[0:3,0:1],0)

Note that this is the numpy sort routine, operating on the first 3 rows of a numpy 2D array.

Having generated the code that was returning numerical results in the desired format, it was just necessary to check that the results were correct, however a quick check soon showed that they were not.

For the Quadratic routine a quick check through the code soon found the source of the error.  I had converted the VBA:

If (Abs(B) >= Abs(C)) Then
            E = 1# - (A / B) * (C / B)
            D = Sqr(Abs(E)) * Abs(B)
        Else
            E = A
            If (C < 0#) Then E = -A
            E = B * (B / Abs(C)) - E
            D = Sqr(Abs(E)) * Sqr(Abs(C))
        End If

To Python

     if (abs(B) >= abs(C)):
            E = 1. - (A / B) * (C / B)
            D = ((abs(E)) * abs(B))**0.5
        else:
            E = A
            if (C < 0.):
                 E = -A
             E = B * (B / abs(C)) - E
             D = ((abs(E)) * (abs(C))**0.5)**0.5

The Python math module has a sqrt function, so it would have been simpler to use that, but using the equivalent **0.5, it was just necessary to correct the placement of brackets and exponents:

     if (abs(B) >= abs(C)):
            E = 1. - (A / B) * (C / B)
            D = ((abs(E))**0.5) * abs(B)
        else:
            E = A
            if (C < 0.):
                E = -A
            E = B * (B / abs(C)) - E
            D = ((abs(E))**0.5) * (abs(C))**0.5

The problem with the cubic code was not so obvious, but some experimentation revealed that the problem was occurring when there were one real and two complex roots, and was caused because:

A_ = -sgnR * (abs(R_) + (R2 - Q3) ** 0.5) ** (1 / 3)

was always evaluating to 1 or -1, i.e. to -sgnR.  The problem was due to the fact that in Python version 2 division of two integers always returns an integer, so 1/3 = 0.  To get a decimal result in Python 2 at least one of the two values must be non-integer, so 1./3 = 0.3333333333333333, and yields the correct result.

In this case changing the one instance of 1/3 to 1./3 was sufficient to fix the problem, but a better solution can be achieved by adding:

from __future__ import division

to the top of the code module.  This line causes Python Version 2 to use the Version 3 rules for division, which yields a decimal result when two integers are divide, rather than an integer.

Results from the two functions are shown in the screenshot below:

py_Quad and py_CubicC results

py_Quad and py_CubicC results

Posted in Excel, Link to Python, Maths, UDFs, VBA | Tagged , , , , , , | 2 Comments

Unchained Unchained Melodies and Unchained Guitars

The Telescoper has a link to a short YouTube clip of the original and excellent version of The Unchained Melody (subsequently made famous by the Righteous Brothers), which led me to a slightly less conventional version:

And at the suggested links at the end of that I was taken to guitarist Pierre Bensusan:

And while I was browsing guitarists, this from The Little Unsaid caught my attention:

Listening to the Pierre Bensusan tracks, I wondered if he had ever played with John Renbourn.  It seems he has, and that Renbourn (and Bert Jansch) were some of his earliest influences:

Interview Double Bill Concert with John Renbourn (2002)

Posted in Bach | Tagged , , , | 1 Comment

Python for VBA users – 2

This post will look in more detail at translating the VBA Quadratic function (described in Python for VBA users – 1) into a Python function, linked to Excel via PyXll.  For the purposes of this exercise I created a new Python module called quadratic2.py, and copied the VBA code into the blank module, using the PyScripter text editor.

After the first pass through of the procedure given in the previous Python post the code looked like this:

@xl_func("numpy_array CoeffA: numpy_array")
def py_Quadratic2(CoeffA):
# Calculate the zeros of the quadratic a*z**2+b1*z+c.
# The quadratic formula, modified to avoid overflow, is used to find the
# larger zero if the zeros are real and both zeros are complex.
# The smaller real zero is found directly from the product of the zeros c/a.
    if CoeffA.shape[0] == 1:
        CoeffA = transpose(CoeffA)
    A = CoeffA[0]
    B1 = CoeffA[1]
    C = CoeffA[2]
    if (A == 0.):
        NR = 1.
        SR = 0.
        if (B1 != 0):
            SR = -C / B1
        LR = 0.
        SI = 0.
        LI = 0.
    elif (C == 0.):
        NR = 1.
        SR = 0.
        LR = -B1 / A
        SI = 0.
        LI = 0.
    else:
        # Compute discriminant avoiding overflow
        B = B1 / 2.
        if (abs(B) >= abs(C)):
            E = 1. - (A / B) * (C / B)
            D = ((abs(E)) * abs(B))**0.5
        else:
            E = A
            if (C < 0.):
                 E = -A
             E = B * (B / abs(C)) - E
             D = ((abs(E)) * (abs(C))**0.5)**0.5
         if (E >= 0.):
            # Real zeros
            NR = 2
            if (B >= 0.):
                D = -D
            LR = (-B + D) / A
            SR = 0.
            if (LR <> 0.):
                SR = (C / LR) / A
            SI = 0.
            LI = 0.
        else:
            # complex conjugate zeros
            SR = -B / A
            LR = SR
            SI = abs(D / A)
            LI = -SI<br />
    if NR == 1:
        QuadA[1, 1] = SR
        QuadA[3, 1] = NR
    else:
        if LR < SR:
            QuadA[1, 1] = LR
            QuadA[2, 1] = SR
            QuadA[3, 1] = NR
            if NR == 0:
                QuadA[1, 2] = LI
                QuadA[2, 2] = SI
                QuadA[3, 2] = 2
        else:
            QuadA[1, 1] = SR
            QuadA[2, 1] = LR
            QuadA[3, 1] = NR
            if NR == 0:
                QuadA[1, 2] = SI
                QuadA[2, 2] = LI
                QuadA[3, 2] = 2
    return QuadA

Note that the function argument is designed to accept a 3 cell range, which may be either a single row or a single column.  The logic for dealing with this has been changed in the Python function, for reasons that will discussed in a later post.  For now just observe that the contents of the three cell input range (CoeffA) are copied into variables A, B1 and C.

After saving this code, and re-loading PyXll in Excel (Using Add-ins, PyXll, Reload PyXll), I entered three quadratic coefficients (a, b and c in ax^2 + bx + c = 0) in adjacent cells, and the function in a blank cell:

=py_Quadratic2(G10:I10)

This returned a Name? message. Before a new module can be used it must be added to the pyxll.cfg file in the PyXll folder.  After adding quadratic2 to the modules list in the cfg file, and re-loading PyXll in Excel I was still getting the Name? message.

The problem this time could be found in the PyXll log file, which showed:

  • 2013-09-28 15:58:00,308 – ERROR : Error importing ‘quadratic2’: name ‘xl_func’ is not defined

The PyXll library must be imported at the top of any code module where it is used.  In addition I would be using maths functions, and also numpy arrays and the transpose function from the numpy library, so the following lines were added to the top of the code module:

from pyxll import xl_func
from math import *
from numpy import *

Saving and re-loading the code, the function now displayed: #NUM! indicating progress, but not quite there yet.

Looking at the log file again, the last message was:

“2013-09-28 16:42:25,936 – ERROR : if NR == 1:
2013-09-28 16:42:25,936 – ERROR : UnboundLocalError: local variable ‘NR’ referenced before assignment.

Looking through the code, you will see that the variable NR is assigned a value 1 if either A or C are equal to zero, but if both are non-zero we arrive at the statement “if NR == 1:” before NR has been assigned any value.  This can be corrected by simply inserting NR = 0 near the top of the code.

Saving and re-loading, the function still shows #NUM!, and the log file now records:

2013-09-28 16:51:23,957 – ERROR : QuadA[1, 1] = SR
2013-09-28 16:51:23,959 – ERROR : NameError: global name ‘QuadA’ is not defined

We need to create the array QuadA before we can assign values to it.  We will look at the Python alternatives for storing arrays (or array-like objects) in a later post; for now we will just create a 3×2 array using the numpy “zeros” function:

QuadA = zeros((3,2))

Note the double brackets required for a 2D array, and that zeros uses () rather than []. Also remember that all Python arrays (and lists, etc.) are zero based, whereas our VBA arrays were dimensioned as base 1.  All the array indices therefore should be reduced by 1:

    if NR == 1:
        QuadA[0, 0] = SR
        QuadA[2, 0] = NR
    else:
        if LR < SR:
            QuadA[0, 0] = LR
            QuadA[1, 0] = SR
            QuadA[2, 0] = NR
            if NR == 0:
                QuadA[0, 1] = LI
                QuadA[1, 1] = SI
                QuadA[2, 1] = 2
...

After making all the required changes, saving the module, and re-loading in Excel, we finally have a function that works:

py_quad

Posted in Excel, Link to Python, NumPy and SciPy, UDFs, VBA | Tagged , , , , , , | 11 Comments

Commenting a block of code in VBA

One way is to manually place an apostrophe at the start of every line in the block.

A much easier way (I just found out today) is:

  • Go to View-Toolbars-Customise
  • Select the Command tab
  • Select the Edit Category on the left
  • Drag the “Comment Block” and “Uncomment Block” icons onto your toolbar.

Now you can comment (or uncomment) as many lines as you like with a single click.

Posted in Excel, VBA | Tagged , , | 9 Comments