More on Regular Expressions

The previous post presented a User Defined Function (UDF) using regular expressions to extract all numbers, or a selected number, from any string (or a list of strings).  In this post I will look more closely at the code used to achieve this, and present a general purpose UDF allowing regular expressions to be used on strings.  A spreadsheet with examples of both functions, and full open-source code, can be downloaded from RegExpres.xlsb.

First the Regular Expression object must be created and given the appropriate criteria to return all numbers (0-9), the decimal character (“.”), the exponent symbol (“E”), and the plus and minus symbols (“+-“).  It is also set to return all matching strings (Global = True) and to ignore the case of alphabetic characters (IgnoreCase = True):


Const ExpString As String = "E", pmString As String = "+-"
'Initialize variables
Set RE = CreateObject("VBScript.RegExp")
'Criteria for Regular Expression
REPatt = "[0-9" & DecString & ExpString & pmString & "]"
With RE
.Pattern = REPatt
.Global = True
.IgnoreCase = True
End With

There follows some code to convert the input range to a variant array and count the number of rows; the numbers in each string are then extracted with the line:

Set Matches = RE.Execute(NumString)

As discussed in the previous post, for a string containing a single value, that is all that is required, but if there is more than one number all the number strings will be concatenated, and must be separated. This can be done by stepping through the original string and comparing the characters with those in the Matches variable, as shown below:

'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string
        strNumber = ""
        NumMatches = Matches.Count
        NumChar = Len(NumString)
        j = 0
        k = 1
        PrevMatch = False
        For i = 1 To NumChar
            MatchString = Mid(NumString, i, 1)
            If PrevMatch = False And UCase(MatchString) = ExpString Then
                j = j + 1
            ElseIf PrevMatch = False And (MatchString = "+" Or MatchString = "-") And _
                   IsNumeric(Mid(NumString, i + 1, 1)) = False Then
                j = j + 1
            ElseIf MatchString <> IgnoreString Then
                Digit = Matches(j)
                If Digit = MatchString Then
                    strNumber = strNumber + Matches(j)
                    j = j + 1
                    PrevMatch = True
                ElseIf PrevMatch = True Then
                    If Position = 0 Then
                        OutA(Row, k) = CDbl(strNumber)
                    ElseIf Position = k Then
                        OutA(Row, 1) = CDbl(strNumber)
                        Exit For
                    End If
                    strNumber = ""
                    k = k + 1
                    PrevMatch = False
                End If
            End If
        Next i
        If PrevMatch = True Then
            If Position = 0 Then
                OutA(Row, k) = CDbl(strNumber)
            ElseIf k = Position Then
                OutA(Row, 1) = CDbl(strNumber)
            End If
        End If
        If k < Position Then OutA(Row, 1) = CVErr(xlErrNA)
    Next Row

Regular Expressions also have many other uses, and the Reg_Exp UDF allows them to be applied quickly and easily. The Regular Expression object has the methods “List”, “Test”, and “Replace”, and the screen-shots below show the use of these methods on a list of strings:

Extract all numbers

RegExp1

Extract all occurrences of the string “Node”

RegExp2

Find strings containing “Node”

RegExp3

Replace “Node” with “Number”

RegExp4

Find all four letter strings starting with “n” and ending with “e”

RegExp5

For more details of setting up regular expression patterns see:
VBA RegEx: How to Evaluate and Use “Regular Expressions”

Finally note that if you want to apply these UDFs to a range they must be entered as an Array Function to return all the results.

This entry was posted in Excel, UDFs, VBA and tagged , , , , . Bookmark the permalink.

9 Responses to More on Regular Expressions

  1. Doug,
    Very nice. I try to avoid Array Functions if possible. I would probably pass the original strings into an array, do any processing, storing results in new arrays and finally output the results using Range().Value = Application.Transpose(array)

    Like

    • dougaj4 says:

      Hi Winston, thanks for the comments.

      I love array functions (as you may have noticed if you have had a browse through the blog), but I know that not everybody does. It’s easy to wrap them in a sub though, in fact I think I might make that the next post topic.

      Like

  2. Jeff Weir says:

    This is awesome, Doug. I’ve managed to miss it till now. Hey, what does the ‘Update Regular Expression’ button in the 2nd tab do?

    Like

  3. Jeff Weir says:

    Ah…it seems to be just copying the values from column E to column I as hard-coded values.

    Also, I’ve noticed that the ExtractNums function seems to return a 2d array comprising of what you want in the left most column and a whole bunch of zeros in the other. For instance, =extractnums(A34,1) returns this:
    {10,0,0,0,0,0,0,0,0,0}

    I haven’t looked into the code yet, but imagine that’s not intentional?

    Like

    • dougaj4 says:

      Hi Jeff. The button runs a Sub, that does the same thing as the UDF (for the benefit of people who don’t like UDFs :)). It writes the results to a range called “subout”.

      For the number extraction, there are two versions, ExtractNum, and ExtractNums. If you only want one number, use ExtractNum. If you want more than one, but with no zeros, ExtractNums has an optional MaxNum argument (default 10), so you can set that to the number you want.

      Or you could add some code to count the maximum number and set the output array size accordingly.

      By the way, Winston Snyder did all the hard work; I just converted his code into UDFs.

      Like

  4. Jeff Weir says:

    I just stumbled across an incredible article on Increasing Performance of Regular Expressions in VBA at a great Access VBA blog: http://bytecomb.com/increasing-performance-of-regular-expressions-in-vba/
    Here’ a direct quote

    Behind the scenes, every regular expression pattern gets compiled into a tiny little program that executes in order to actually apply the regular expression. The first time you call any of the methods on a RegExp object there will be a short pause while the expression is compiled. Once the regular expression has been compiled, though, it can execute right away.

    The pause for compilation is usually on the order of a few tens of milliseconds. That’s small enough that you might never notice it even when stepping through code. But it can really add up when the expression is going to be applied to a few hundred thousand rows.

    If you’re using a RegExp object in a procedure while looping through a recordset programmatically you can just keep using the same RegExp object. But what about when the same pattern is used in independent calls to the same function — or even to different functions entirely? The solution is to cache all of your regular expressions objects, and serve them up using a simple factory pattern. In this article I’ll show how to do just that.

    This blog is gold.

    Liked by 1 person

    • dougaj4 says:

      Thanks Jeff. He seems to have gone quiet recently, but the last article posted there on collections of collections is a must read for me.

      Like

  5. SAGAR APSHANKAR says:

    Thank you so much! Your post is going to save me many hours of frustration.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.