Regular Expressions in Excel

I recently discovered that Microsoft are working on Excel functions to work with Regular Expressions. A beta version is currently available to those on the “insider” program, but this functionality has long been available through VBA, and can also be accessed from Python, using pyxll.

Details of the new Excel functions are at:

https://insider.microsoft365.com/en-us/blog/new-regular-expression-regex-functions-in-excel

I have previously covered the VBA use of regular expressions in detail at:

https://newtonexcelbach.com/2014/11/30/extracting-numbers-from-text-and-regular-expressions/
https://newtonexcelbach.com/2014/12/08/more-on-regular-expressions/
https://newtonexcelbach.com/2017/03/11/extracting-numbers-from-text-update/

Detailed documentation of the Python code for regular expressions is at:

From the Analyst Cave tutorial I was able to very quickly write four Excel user defined functions using pyxll to access the Python code:

import re

@xl_func()
@xl_arg('regexp', 'str')
@xl_arg('txt', 'str')
def RESearch(regexp, txt, rtn = 3):
    """
Returns a Match object if there is a match for regexp anywhere in txt

:param regexp: Regular expression
:param txt: Text to be searched
:param rtn: Value to be returned; 1 = span array for match location, 2 = input txt, 3 = list of matches (default)
    """
    x = re.search(regexp, txt)
    try:
        # span() string group()
        if rtn == 1:
            return x.span()
        elif rtn == 2:
            return x.string
        else:
            return x.group()
    except:
        return "No match"

@xl_func()
@xl_arg('regexp', 'str')
@xl_arg('txt', 'str')
def REFindall(regexp, txt):
    """
Returns a list containing all matches 

:param regexp: Regular expression
:param txt: Text to be searched
    """
    x = re.findall(regexp, txt)
    try:
        return x
    except:
        return "No match"

@xl_func()
@xl_arg('regexp', 'str')
@xl_arg('txt', 'str')
def RESplit(regexp, txt):
    """
Returns a list where the string has been split at all matches 

:param regexp: Regular expression
:param txt: Text to be searched
    """
    x = re.split(regexp, txt)
    try:
        return x
    except:
        return "No match"

@xl_func()
@xl_arg('regexp', 'str')
@xl_arg('replace', 'str')
@xl_arg('txt', 'str')
def RESub(regexp, replace, txt):
    """
Replaces one or many matches with a string 

:param regexp: Regular expression
:param txt: Text to be searched
    """
    x = re.sub(regexp, replace, txt)
    try:
        return x
    except:
        return "No match"

Example results are shown below, using an example from the Microsoft site:

Posted in Excel, Link to Python, PyXLL, UDFs, VBA | Tagged , , , , , , | Leave a comment

Pale Blue Dot

34 years ago Carl Sagan presented an image of the Earth taken from outer space, known as the pale blue dot.

An event very different from a 13 year old boy singing a song he had written himself, but the message was the same:

Posted in Climate, Newton | Tagged , | Leave a comment

Rory Phillips – The Truth

“‘The Truth’ is a passionate call to action on climate change from 13 year old Rory Phillips. “I wanted to say something with this song. The lyrics are subtle, not in your face; but the message is clear. We need to do more, from the top down.”

Posted in Bach, Climate, Newton | Tagged , , | Leave a comment

Creating sequential lists in Excel

I recently discovered that Excel has a new SEQUENCE function that will return a sequence of increasing numbers in a column, row, or 2D range. Use of the new function is shown in the screenshot below, with some old and new alternatives.

The formula used for each example is shown above the output, using the FORMULATEXT function.

The function arguments are:

  • Number of rows
  • Number of columns (default = 1)
  • Start value (default = 1)
  • Increment (default = 1)

The start value and increment may be integers or floats, and may be entered as values, cell references, or formulas. The example in Column I illustrates the use of the ROW function to calculate the required number of rows. This has the advantage that if rows are inserted within the output range, the range will expand to include the new number.

An alternative to the new function is using the range fill functionality, or entering a simple addition formula and copying down the required number of rows. Use of the Fill, Series command (Column L) is shown below:

  • Enter the start value in the first cell, and with that cell selected, click the Fill icon on the Editing menu (rectangle with blue down arrow). Select Series to display the dialog as shown above and select “Series in Columns” and enter the step value and stop value.

Ranges can also be filled by entering the required values in the first two rows, selecting those two cells, then dragging down the bottom right hand corner of the selected range:

Finally, since Python has a range function that will generate a sequential list of integers, it was quite simple to call this from Excel, using pyxll:

@xl_func
@xl_arg('start', 'int')
@xl_arg('stop', 'int')
@xl_arg('step', 'int')
@xl_return('numpy_array<int>', ndim=1)
def py_Range(start, stop = '', step = 1):
    if type(stop) == str:
        return np.array(range(0, start))
    else:    
        return np.array(range(start, stop, step))

The last two examples in the first screenshot show the Python function results. Note that:

  • The function will always return a single column.
  • Input is the start and stop values, and optional step, rather than number of values and start value.
  • The default start value is zero, rather than 1.
  • The ‘stop’ value is the lowest value not included in the output.
Posted in Excel, Link to Python, PyXLL, UDFs | Tagged , , , , | Leave a comment

Let, Lambda, and py_Eval

I have covered the new Excel Let and Lambda functions before at:

Two New Functions; Let and Lambda

Eval and Let examples

Lambda and VBA

The Lambda function is now available to all Microsoft 365 subscribers, so I have now looked at some examples comparing Let and Lambda with my Python based user defined function, py_Eval.

The spreadsheet can be downloaded from: Let_Lambda.zip and the required Python code from: py_SciPy.zip. The Python function also requires pyxll to connect to Excel. See python-and-pyxll/ for details.

The screenshots below show examples of the three functions used to evaluate the same formulas.

py_Eval takes three arguments: a cell reference with the formula to be evaluated and two column ranges with the variable symbols and their associated values.

Let input is alternate variable names and values followed by the formula as the last argument. Not that the formula must be entered directly in the function input:

The first Lambda option is to enter the variable names followed by the formula, then the variable values are entered in a separate set of brackets:

The second Lambda option is to enter the function in the name manager and give it a name:

The function then operates as a User Defined Function (UDF) with the allocated name:

The py_Eval function calls Python code and will therefore accept any available Python function as part (or all) of the formula:

In the next example the initial input is the X and Y components of a beam length, which are used to calculate the angle of the beam to the X axis using py_Eval. Note that the function must use the Python arctan2 function (rather than ATan2) and the order of the input is dy, dx, rather than dx, dy:

The angle theta is then used by py_Eval with the axial and shear forces to evaluate the formulas in cells A17 and A18:

Using Let, the input arguments can define a variable used by a later argument, so in this case the values dx and dy are input and used to calculate theta (using ATan2), which is then used in the final formula:

Lambda does not allow the calculation of arguments within the function definition, so the theta value calculated on the spreadsheet must be used as input, as for py_Eval:

In the next example py_Eval is first used to evaluate a formula with 5 variables in the usual way:

py_Eval can also return a formula with the variables replaced by their assigned values,, by setting the optional “out” argumant to zero. This formula is evaluated by py_Eval, omitting the variable and value arguments:

The formula in cell A 34 may also be evaluated by Let or Lambda, but they both require at least one argument to be defined, even though it isn’t used:

Finally an example of a very long formula (from a discussion at Eng-Tips), shown with the original formula evaluated, the formula variables converted to values, then this formula evaluated. The original formula used the Excel LN function (natural logarithm), which has been changed to the Python equivalent, log:

Posted in Excel, Link to Python, Maths, Newton, NumPy and SciPy, PyXLL, UDFs | Tagged , , , , , , , | Leave a comment