## Extracting text from an Internet table

For use in my Units spreadsheet (Units4Excel) I want to extract the unit conversion values from the comprehensive Wikipedia unit conversion table.

On Wikipedia the table looks like this:

and when copied and pasted as Unicode text into Excel, like this:

To convert the text in the final column to values the following changes are required:

• Skip the leading non-numeric characters,
• Remove spaces between numbers.
• Remove the final numbers enclosed in brackets, indicating the uncertainty in the final digits.
• Convert the exponent (e.g. x10-27) to Excel format (E-27). Note that a positive exponent displays as a 4 digit number, e.g. x1027.
• Convert the resulting text string to a number with double data type.
• Extract the units
• Skip references to notes.

The following code does the job:

``````Function ExtractVal(Txt As String, Optional ExtractUnits As Boolean = True)
' Extract number and optionally units from Wikipedia unit conversion factor table
' Remove preceding text and trailing notes, and variance value enclosed in ()
' Convert x10nn to Enn or x10-nn to E-nn
Dim n As Long, m As Long, char As String, ValStr As String, RtnA() As Variant, i As Long, strcode As Long

' Skip over any non-numeric text before the first number
If ExtractUnits Then
ReDim RtnA(1 To 2)
Else
ReDim RtnA(1 To 1)
End If
n = Len(Txt)

For i = 1 To n
char = Mid(Txt, i, 1)
If IsNumeric(char) Then Exit For
Next i
ValStr = char
m = i + 1

' Extract all numbers until the first non-numeric text
' Skip over single spaces followed by a number
For i = m To n
char = Mid(Txt, i, 1)
If IsNumeric(char) = False And char <> "." Then
If IsNumeric(Mid(Txt, i + 1, 1)) = False Or char <> " " Then Exit For
Else
ValStr = ValStr & char
End If
Next i
m = i + 1

' Skip over numbers enclosed in ()
char = Mid(Txt, i, 1)
If char = "(" Then
For i = m To n
char = Mid(Txt, i, 1)
If char = ")" Then
i = i + 1
Exit For
End If
Next i
End If

' Extract numbers following 'x10' and add to value string as Enn
' The 'x' in the Wikipedia table is Ascii code 215
char = Mid(Txt, i, 1)
strcode = Asc(char)
If strcode = 215 Then
m = i + 3
ValStr = ValStr & "E"
For i = m To n
char = Mid(Txt, i, 1)
strcode = Asc(char)
If strcode = 45 Then char = "-"
If IsNumeric(char) = False And char <> "-" Then Exit For
ValStr = ValStr & char
Next i
End If
' Convert value string to double and copy to RtnA
RtnA(1) = CDbl(ValStr)

' Extract unit text, skip over final note reference
If ExtractUnits Then
m = i + 1
ValStr = ""
For i = m To n
char = Mid(Txt, i, 1)
If char = "[" Then Exit For
ValStr = ValStr & char
Next i
' Copy unit to RtnA
RtnA(2) = ValStr
End If

ExtractVal = RtnA
End Function
``````

This function can be applied to the text at the top of the final column, and copied down, resulting in:

Columns AB and AC are the function output. Column AD uses the built in IsNumber function to confirm that the values in Column AB are indeed valid numbers.

This entry was posted in Excel, Newton, UDFs, VBA and tagged , , , , , , , . Bookmark the permalink.

### 3 Responses to Extracting text from an Internet table

1. Kirby says:

Nice trick to return more than one value from a function

Like

This site uses Akismet to reduce spam. Learn how your comment data is processed.