Select and sum data using the scripting dictionary

Another User Defined Function (UDF) prompted by a query at Eng-Tips.

The question concerned a set of data consisting of 13 columns, which was to be divided into two sets, based on the contents of Column 4, then the contents of columns 6 to 11 were to be summed for each row with identical contents in columns 12 and 13.

The first attempt copied the data into two arrays (based on the contents of Column 4), then did a nested loop, for each row of Array1 looping through all of Array2, and adding the contents of Array2 to the current row of Array1, when columns 12 and 13 were identical in each array.  This worked, but was extremely slow, taking about 4 minutes for 50,000 rows of data.

One way to speed up the process would be to sort both arrays based on the contents of columns 12 and 13, then exit each loop when the columns no longer matched.  An easier and more efficient way though is to create a dictionary object; the algorithm is:

  • Copy the data into two arrays, based on the contents of Column 4
  • Create a dictionary object of Array2, using a key formed from the combined text of columns 12 and 13, and an Item value of the row number.
  • Whenever a later row in Array2 is found matching an existing dictionary key, the values of columns 6 to 11 are added to the matching row.
  • Loop through Array1, and add the contents of columns 6 to 11 from the first matching row in Array 2, using the dictionary to identify the row number

The revised routine reduced the time for 50,000 rows from more than 4 minutes to less than 1 second!

The spreadsheet can be downloaded from SelectSum.xlsb, including full open source code.

The function SumSelectD can be used as an array function directly on the spreadsheet, or run the routine CopySumDict to use the data on sheet1, and copy the results to sheet2.  Note that the subroutine needs three named ranges:

  • TLD: the top left cell of the input data
  • Criteria: five cells in a column, defining the selection criteria
  • Results:  A range of two or more cells at the top left of the output range (the subroutine automatically resizes this range to accommodate the output data).

Also note that the code includes the original, very slow, routines (SumSelect and CopySum).  These will work, but may take several minutes to process a large range of data.

Finally, the spreadsheet also includes another short UDF using the scripting dictionary, taken from Daily-Dose-of-Excel.  This UDF counts the number of unique items from a comma delimited list in a spreadsheet cell.  The download file includes an example, and the full code is shown below:

Function F_unique(c00)
Dim sn As Variant, j As Long, c01 As Long, Dict As Scripting.Dictionary
sn = Split(c00, ",")

Set Dict = New Scripting.Dictionary

With Dict
For j = 0 To UBound(sn)
c01 = .Item(Trim(sn(j)))
Next
F_unique = .Count
End With
End Function

See the DDoE post for an even shorter version.

F_Unique Function

F_Unique Function

 

Note that to use the scripting dictionary you need to create a reference to “Microsoft Scripting Runtime” under Tools-References in the Visual Basic Editor.

This entry was posted in Excel, UDFs, VBA and tagged , , , , . Bookmark the permalink.

2 Responses to Select and sum data using the scripting dictionary

  1. Pingback: Daily Download 33: Miscellaneous | Newton Excel Bach, not (just) an Excel Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.