The previous post presented a User Defined Function (UDF) using regular expressions to extract all numbers, or a selected number, from any string (or a list of strings). In this post I will look more closely at the code used to achieve this, and present a general purpose UDF allowing regular expressions to be used on strings. A spreadsheet with examples of both functions, and full open-source code, can be downloaded from RegExpres.xlsb.
First the Regular Expression object must be created and given the appropriate criteria to return all numbers (0-9), the decimal character (“.”), the exponent symbol (“E”), and the plus and minus symbols (“+-“). It is also set to return all matching strings (Global = True) and to ignore the case of alphabetic characters (IgnoreCase = True):
Const ExpString As String = "E", pmString As String = "+-" 'Initialize variables Set RE = CreateObject("VBScript.RegExp") 'Criteria for Regular Expression REPatt = "[0-9" & DecString & ExpString & pmString & "]" With RE .Pattern = REPatt .Global = True .IgnoreCase = True End With
There follows some code to convert the input range to a variant array and count the number of rows; the numbers in each string are then extracted with the line:
Set Matches = RE.Execute(NumString)
As discussed in the previous post, for a string containing a single value, that is all that is required, but if there is more than one number all the number strings will be concatenated, and must be separated. This can be done by stepping through the original string and comparing the characters with those in the Matches variable, as shown below:
'Loop Matches collection and compare with each character of original string to build strings of all numbers in the sample string strNumber = "" NumMatches = Matches.Count NumChar = Len(NumString) j = 0 k = 1 PrevMatch = False For i = 1 To NumChar MatchString = Mid(NumString, i, 1) If PrevMatch = False And UCase(MatchString) = ExpString Then j = j + 1 ElseIf PrevMatch = False And (MatchString = "+" Or MatchString = "-") And _ IsNumeric(Mid(NumString, i + 1, 1)) = False Then j = j + 1 ElseIf MatchString <> IgnoreString Then Digit = Matches(j) If Digit = MatchString Then strNumber = strNumber + Matches(j) j = j + 1 PrevMatch = True ElseIf PrevMatch = True Then If Position = 0 Then OutA(Row, k) = CDbl(strNumber) ElseIf Position = k Then OutA(Row, 1) = CDbl(strNumber) Exit For End If strNumber = "" k = k + 1 PrevMatch = False End If End If Next i If PrevMatch = True Then If Position = 0 Then OutA(Row, k) = CDbl(strNumber) ElseIf k = Position Then OutA(Row, 1) = CDbl(strNumber) End If End If If k < Position Then OutA(Row, 1) = CVErr(xlErrNA) Next Row
Regular Expressions also have many other uses, and the Reg_Exp UDF allows them to be applied quickly and easily. The Regular Expression object has the methods “List”, “Test”, and “Replace”, and the screen-shots below show the use of these methods on a list of strings:
Extract all numbers
Extract all occurrences of the string “Node”
Find strings containing “Node”
Replace “Node” with “Number”
Find all four letter strings starting with “n” and ending with “e”
For more details of setting up regular expression patterns see:
VBA RegEx: How to Evaluate and Use “Regular Expressions”
Finally note that if you want to apply these UDFs to a range they must be entered as an Array Function to return all the results.
Very nice. I try to avoid Array Functions if possible. I would probably pass the original strings into an array, do any processing, storing results in new arrays and finally output the results using Range().Value = Application.Transpose(array)
Hi Winston, thanks for the comments.
I love array functions (as you may have noticed if you have had a browse through the blog), but I know that not everybody does. It’s easy to wrap them in a sub though, in fact I think I might make that the next post topic.
This is awesome, Doug. I’ve managed to miss it till now. Hey, what does the ‘Update Regular Expression’ button in the 2nd tab do?
Ah…it seems to be just copying the values from column E to column I as hard-coded values.
Also, I’ve noticed that the ExtractNums function seems to return a 2d array comprising of what you want in the left most column and a whole bunch of zeros in the other. For instance, =extractnums(A34,1) returns this:
I haven’t looked into the code yet, but imagine that’s not intentional?
Hi Jeff. The button runs a Sub, that does the same thing as the UDF (for the benefit of people who don’t like UDFs :)). It writes the results to a range called “subout”.
For the number extraction, there are two versions, ExtractNum, and ExtractNums. If you only want one number, use ExtractNum. If you want more than one, but with no zeros, ExtractNums has an optional MaxNum argument (default 10), so you can set that to the number you want.
Or you could add some code to count the maximum number and set the output array size accordingly.
By the way, Winston Snyder did all the hard work; I just converted his code into UDFs.
I just stumbled across an incredible article on Increasing Performance of Regular Expressions in VBA at a great Access VBA blog: http://bytecomb.com/increasing-performance-of-regular-expressions-in-vba/
Here’ a direct quote
This blog is gold.
LikeLiked by 1 person
Thanks Jeff. He seems to have gone quiet recently, but the last article posted there on collections of collections is a must read for me.
Thank you so much! Your post is going to save me many hours of frustration.