For use in my Units spreadsheet (Units4Excel) I want to extract the unit conversion values from the comprehensive Wikipedia unit conversion table.
On Wikipedia the table looks like this:

and when copied and pasted as Unicode text into Excel, like this:

To convert the text in the final column to values the following changes are required:
- Skip the leading non-numeric characters,
- Remove spaces between numbers.
- Remove the final numbers enclosed in brackets, indicating the uncertainty in the final digits.
- Convert the exponent (e.g. x10-27) to Excel format (E-27). Note that a positive exponent displays as a 4 digit number, e.g. x1027.
- Convert the resulting text string to a number with double data type.
- Extract the units
- Skip references to notes.
The following code does the job:
Function ExtractVal(Txt As String, Optional ExtractUnits As Boolean = True)
' Extract number and optionally units from Wikipedia unit conversion factor table
' Remove preceding text and trailing notes, and variance value enclosed in ()
' Convert x10nn to Enn or x10-nn to E-nn
Dim n As Long, m As Long, char As String, ValStr As String, RtnA() As Variant, i As Long, strcode As Long
' Skip over any non-numeric text before the first number
If ExtractUnits Then
ReDim RtnA(1 To 2)
Else
ReDim RtnA(1 To 1)
End If
n = Len(Txt)
For i = 1 To n
char = Mid(Txt, i, 1)
If IsNumeric(char) Then Exit For
Next i
ValStr = char
m = i + 1
' Extract all numbers until the first non-numeric text
' Skip over single spaces followed by a number
For i = m To n
char = Mid(Txt, i, 1)
If IsNumeric(char) = False And char <> "." Then
If IsNumeric(Mid(Txt, i + 1, 1)) = False Or char <> " " Then Exit For
Else
ValStr = ValStr & char
End If
Next i
m = i + 1
' Skip over numbers enclosed in ()
char = Mid(Txt, i, 1)
If char = "(" Then
For i = m To n
char = Mid(Txt, i, 1)
If char = ")" Then
i = i + 1
Exit For
End If
Next i
End If
' Extract numbers following 'x10' and add to value string as Enn
' The 'x' in the Wikipedia table is Ascii code 215
char = Mid(Txt, i, 1)
strcode = Asc(char)
If strcode = 215 Then
m = i + 3
ValStr = ValStr & "E"
For i = m To n
char = Mid(Txt, i, 1)
strcode = Asc(char)
If strcode = 45 Then char = "-"
If IsNumeric(char) = False And char <> "-" Then Exit For
ValStr = ValStr & char
Next i
End If
' Convert value string to double and copy to RtnA
RtnA(1) = CDbl(ValStr)
' Extract unit text, skip over final note reference
If ExtractUnits Then
m = i + 1
ValStr = ""
For i = m To n
char = Mid(Txt, i, 1)
If char = "[" Then Exit For
ValStr = ValStr & char
Next i
' Copy unit to RtnA
RtnA(2) = ValStr
End If
ExtractVal = RtnA
End Function
This function can be applied to the text at the top of the final column, and copied down, resulting in:

Columns AB and AC are the function output. Column AD uses the built in IsNumber function to confirm that the values in Column AB are indeed valid numbers.
Nice trick to return more than one value from a function
LikeLike
Thanks for the comment.
It’s worth mentioning that the new “dynamic array” feature (that automatically displays the full extent of an array returned by any function or formula) is only available in Office 365.
See: https://newtonexcelbach.com/2019/12/08/excel-dynamic-arrays/
for more details.
Also see: https://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/
for information on using array formulas in older versions of Excel.
LikeLike
Pingback: Units4Excel Update | Newton Excel Bach, not (just) an Excel Blog