Extracting text from an Internet table

For use in my Units spreadsheet (Units4Excel) I want to extract the unit conversion values from the comprehensive Wikipedia unit conversion table.

On Wikipedia the table looks like this:

and when copied and pasted as Unicode text into Excel, like this:

To convert the text in the final column to values the following changes are required:

  • Skip the leading non-numeric characters,
  • Remove spaces between numbers.
  • Remove the final numbers enclosed in brackets, indicating the uncertainty in the final digits.
  • Convert the exponent (e.g. x10-27) to Excel format (E-27). Note that a positive exponent displays as a 4 digit number, e.g. x1027.
  • Convert the resulting text string to a number with double data type.
  • Extract the units
  • Skip references to notes.

The following code does the job:

Function ExtractVal(Txt As String, Optional ExtractUnits As Boolean = True)
' Extract number and optionally units from Wikipedia unit conversion factor table
' Remove preceding text and trailing notes, and variance value enclosed in ()
' Convert x10nn to Enn or x10-nn to E-nn
Dim n As Long, m As Long, char As String, ValStr As String, RtnA() As Variant, i As Long, strcode As Long

' Skip over any non-numeric text before the first number
    If ExtractUnits Then
        ReDim RtnA(1 To 2)
    Else
        ReDim RtnA(1 To 1)
    End If
    n = Len(Txt)
    
    For i = 1 To n
        char = Mid(Txt, i, 1)
        If IsNumeric(char) Then Exit For
    Next i
    ValStr = char
    m = i + 1
    
 ' Extract all numbers until the first non-numeric text
 ' Skip over single spaces followed by a number
    For i = m To n
        char = Mid(Txt, i, 1)
        If IsNumeric(char) = False And char <> "." Then
            If IsNumeric(Mid(Txt, i + 1, 1)) = False Or char <> " " Then Exit For
        Else
            ValStr = ValStr & char
        End If
    Next i
    m = i + 1
    
' Skip over numbers enclosed in ()
    char = Mid(Txt, i, 1)
    If char = "(" Then
        For i = m To n
            char = Mid(Txt, i, 1)
            If char = ")" Then
                i = i + 1
                Exit For
            End If
        Next i
    End If

' Extract numbers following 'x10' and add to value string as Enn
' The 'x' in the Wikipedia table is Ascii code 215
    char = Mid(Txt, i, 1)
    strcode = Asc(char)
    If strcode = 215 Then
        m = i + 3
        ValStr = ValStr & "E"
        For i = m To n
            char = Mid(Txt, i, 1)
            strcode = Asc(char)
            If strcode = 45 Then char = "-"
            If IsNumeric(char) = False And char <> "-" Then Exit For
            ValStr = ValStr & char
        Next i
    End If
 ' Convert value string to double and copy to RtnA
    RtnA(1) = CDbl(ValStr)
    
 ' Extract unit text, skip over final note reference
    If ExtractUnits Then
        m = i + 1
        ValStr = ""
        For i = m To n
            char = Mid(Txt, i, 1)
            If char = "[" Then Exit For
            ValStr = ValStr & char
        Next i
        ' Copy unit to RtnA
        RtnA(2) = ValStr
    End If
    
    ExtractVal = RtnA
End Function

This function can be applied to the text at the top of the final column, and copied down, resulting in:

Columns AB and AC are the function output. Column AD uses the built in IsNumber function to confirm that the values in Column AB are indeed valid numbers.

This entry was posted in Excel, Newton, UDFs, VBA and tagged , , , , , , , . Bookmark the permalink.

3 Responses to Extracting text from an Internet table

  1. Kirby says:

    Nice trick to return more than one value from a function

    Like

  2. Pingback: Units4Excel Update | Newton Excel Bach, not (just) an Excel Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.