This post makes use of a spreadsheet for reading data from text files, including comma separated value (csv) files. The latest version includes a routine for saving the imported data in csv format. It can be downloaded (including full open-source code) from Text-in2.zip.
Excel allows csv files to be opened and saved directly from the File menu, without going through the File-Import process. Unfortunately the data is treated in the same way as text entered into a spreadsheet cell formatted as General, so any text that looks like a number will be stored as a numerical value, and displayed in the format that Excel deems appropriate. If the file is saved with the original name the data will be changed, which in some cases result in dates being changed, or numeric codes being converted into dates.
This is illustrated with the short csv file below, consisting of four columns:
– Numbers
– Text that looks like numbers
– Dates
– Text that looks like dates
The screenshot below shows the results of opening this file in Excel with default language set to Australian English, then saving it as csv, without making any changes.
It can be seen that changes have occurred in all the columns:
- Fractions are treated as dates; 3/4 becomes 3-Apr
- Leading zeros in numerical text strings are deleted
- Numerical text with a single E is treated as scientific notation.
- “” around a text string is deleted and ignored
- Any separator in a text string is converted to /.
- Four digit years are sometimes converted to two, and vice versa
- All the entries under “Look like dates” have been converted into dates, except for those with numbers separated by spaces.
Changing the default language to US English results in significantly different results for the dates, and strings interpreted as dates:
- The fractions (3/4) are converted to 4-Mar, rather than 3-Apr
- Date strings with two numbers are treated as mm/dd, rather than dd/mm, except when the first number is greater than 12.
- Numeric date strings with three numbers display with the day and month numbers unchanged, but are treated in Excel as being mm/dd, rather than dd/mm.
- Unambiguous date strings, are left with the day/month order unchanged.
CSV files can now be opened and saved without any unintended changes using the Spreadsheet Text-in2 (download link at the top of this post), which has three new features:
- Optional formatting of each column
- Text2Date() function to convert any date string into an Excel date value, optionally with a specified day/month/year sequence.
- Save imported data to a csv file.
If each column of a csv file is imported as text (format code = @), the data is all imported with no changes:
For simple applications numbers imported as text will be treated as numbers by Excel, but for many purposes they must be converted to numbers for everything to work as expected; for instance the Sum function treats all text as having value zer0. Any column containing numeric data can be imported as numbers by entering the appropriate format code; for example enter 0.00 to import as numbers and display to 2 decimal points. See the Text-in2 NumFormats sheet for more examples of number format codes.
The dates are imported as text, but can be converted to Excel date values using the Text2Date function.
When the imported data is saved to a csv file all text remains exactly as in the original file:
Text2Date optionally allows the date order to be specified, with the System local default being used when not specified. Date order options are:
0: mm/dd/yyyy (default for US English)
1: dd/mm/yyyy (default for non-US English)
2: yyyy/mm/dd
The function also has an optional Separator argument, which should be specified for any separator other than /.
The screenshot below illustrates use of Text2Date with DateOrder set to zero. Note that in this case the function returns an error message (“Not a valid date string”) if the first number is greater than twelve, or the second position has a month name, rather than a number:
Also note that in this case the format for column 1 was set to 0.00 (rather than @), so this column has been imported as numbers, rather than text. The fractions in rows 32 and 36 have been converted to decimal values, but the text in column 2 remains unchanged.
More details of the Text-in2 spreadsheet (plus other text functions) can be found at:
Daily Download 32: Text functions
Hi Doug,
The download file does not seem to have the column format option shown in the post.
Martin
LikeLike
Sorry, I have now uploaded the correct one. It should be version 1.08 (on the About sheet) and dated 10 June 2015.
LikeLike
Pingback: Importing Text Files; Unix Format | Newton Excel Bach, not (just) an Excel Blog
Pingback: VBA routines for splitting and joining text | Newton Excel Bach, not (just) an Excel Blog