Select and sum data using the scripting dictionary

Another User Defined Function (UDF) prompted by a query at Eng-Tips.

The question concerned a set of data consisting of 13 columns, which was to be divided into two sets, based on the contents of Column 4, then the contents of columns 6 to 11 were to be summed for each row with identical contents in columns 12 and 13.

The first attempt copied the data into two arrays (based on the contents of Column 4), then did a nested loop, for each row of Array1 looping through all of Array2, and adding the contents of Array2 to the current row of Array1, when columns 12 and 13 were identical in each array.  This worked, but was extremely slow, taking about 4 minutes for 50,000 rows of data.

One way to speed up the process would be to sort both arrays based on the contents of columns 12 and 13, then exit each loop when the columns no longer matched.  An easier and more efficient way though is to create a dictionary object; the algorithm is:

  • Copy the data into two arrays, based on the contents of Column 4
  • Create a dictionary object of Array2, using a key formed from the combined text of columns 12 and 13, and an Item value of the row number.
  • Whenever a later row in Array2 is found matching an existing dictionary key, the values of columns 6 to 11 are added to the matching row.
  • Loop through Array1, and add the contents of columns 6 to 11 from the first matching row in Array 2, using the dictionary to identify the row number

The revised routine reduced the time for 50,000 rows from more than 4 minutes to less than 1 second!

The spreadsheet can be downloaded from SelectSum.xlsb, including full open source code.

The function SumSelectD can be used as an array function directly on the spreadsheet, or run the routine CopySumDict to use the data on sheet1, and copy the results to sheet2.  Note that the subroutine needs three named ranges:

  • TLD: the top left cell of the input data
  • Criteria: five cells in a column, defining the selection criteria
  • Results:  A range of two or more cells at the top left of the output range (the subroutine automatically resizes this range to accommodate the output data).

Also note that the code includes the original, very slow, routines (SumSelect and CopySum).  These will work, but may take several minutes to process a large range of data.

Finally, the spreadsheet also includes another short UDF using the scripting dictionary, taken from Daily-Dose-of-Excel.  This UDF counts the number of unique items from a comma delimited list in a spreadsheet cell.  The download file includes an example, and the full code is shown below:

Function F_unique(c00)
Dim sn As Variant, j As Long, c01 As Long, Dict As Scripting.Dictionary
sn = Split(c00, ",")

Set Dict = New Scripting.Dictionary

With Dict
For j = 0 To UBound(sn)
c01 = .Item(Trim(sn(j)))
Next
F_unique = .Count
End With
End Function

See the DDoE post for an even shorter version.

F_Unique Function

F_Unique Function

 

Note that to use the scripting dictionary you need to create a reference to “Microsoft Scripting Runtime” under Tools-References in the Visual Basic Editor.

Posted in Excel, UDFs, VBA | Tagged , , , , | 2 Comments

EvalA update and examples

The EvalA User Defined Function (UDF) has been included in the Eval2 and Eval-Integration spreadsheets for some time, but was not documented.  That has now been fixed, and both spreadsheets can be downloaded from Eval2.Zip (including full open-source code).

The original Eval UDF evaluates a function entered as text on the spreadsheet, substituting values for specified parameters.  See: https://newtonexcelbach.wordpress.com/2012/10/01/daily-download-15-evaluation-of-formulas-in-text/
for more details.

EvalA performs the same task, but also allows one or more variables to be replaced with a range of values, returning a column array of results.  This allows graphs of the functions to be plotted and edited quickly and easily.  Two examples are shown below:

EvalA

AS 3600 Creep Coefficient, k2

AS 3600 Creep Coefficient, k2

The function must be entered as an array function to return all the results:

  • Enter the function (or copy and paste) in the top cell of the output range.
  • Select the complete output range (including the top cell)
  • Press F2 (Edit) then Ctrl-Shift-Enter

See https://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/ for more details.

Posted in Arrays, Charts, Excel, Maths, Newton, UDFs, VBA | Tagged , , , , | Leave a comment

SelectAv Function

Suppose you have a spreadsheet range containing numerical data, and would like to find the average for a number of the most recent values, ignoring the highest (and/or lowest) values from the selected set.

If you like using long array formulas on the spreadsheet, or just want to avoid using VBA, then the solution can be found at: Daily Dose of Excel.

On the other hand, if you would prefer a User Defined Function (UDF) to do the job, here is one that will do it:

Function SelectAv(DataRange As Variant, Optional SelectFrom As Long, _
Optional DiscardHigh As Long, Optional DiscardLow As Long) As Variant

Dim NumRows As Long, NumCols As Long, ResA() As Double, MaxVal As Variant
Dim MinVal As Variant, Val As Variant, ValA() As Variant, Total As Double, Count As Long
Dim i As Long, j As Long, k As Long, Discardk As Long
Const MaxFloat As Double = 1.79769313486231E+308, MinFloat As Double = -1.79769313486231E+308

    If TypeName(DataRange) = "Range" Then DataRange = DataRange.Value2
    NumRows = UBound(DataRange)
    NumCols = UBound(DataRange, 2)
    If SelectFrom = 0 Then SelectFrom = NumCols

    ReDim ResA(1 To NumRows, 1 To 1)
    
    For i = 1 To NumRows
        ReDim ValA(1 To SelectFrom)
        MaxVal = ""
        MinVal = ""
        Total = 0
        Count = 1

        ' Extract Selectfrom values, starting from right hand end
        For j = NumCols To 1 Step -1
            Val = DataRange(i, j)
            If Val <> "" Then
                ValA(Count) = Val
                  Count = Count + 1
                If Count > SelectFrom Then Exit For
            End If
        Next j
        Count = Count - 1
        ' Discard highest and/or lowest value(s) if required
        If Count > SelectFrom - DiscardLow - DiscardHigh Then
            If DiscardHigh > 0 Then
                For j = 1 To DiscardHigh
                    MaxVal = MinFloat
                    Discardk = 1
                    For k = 1 To SelectFrom
                        If ValA(k) <> "" And ValA(k) > MaxVal Then
                            MaxVal = ValA(k)
                            Discardk = k
                        End If
                    Next k
                    ValA(Discardk) = ""
                Next j
            End If
            If DiscardLow > 0 Then

                For j = 1 To DiscardLow
                    MinVal = MaxFloat
                    Discardk = 1
                    For k = 1 To SelectFrom
                        If ValA(k) <> "" And ValA(k) < MinVal Then
                            MinVal = ValA(k)
                            Discardk = k
                        End If
                    Next k
                    ValA(Discardk) = ""
                Next j
            End If
        End If
        ResA(i, 1) = WorksheetFunction.Average(ValA)

    Next i
    SelectAv = ResA
End Function

Results (using the same data as the DDofE example) are shown in the screenshot below:

SelectAv function finding the average of the best (lowest) four out of the 5 most recent scores.

SelectAv function finding the average of the best (lowest) four out of the 5 most recent scores.

The function arguments are:

=SelectAv(DataRange, SelectFrom , DiscardHigh , DiscardLow)

  • DataRange: Spreadsheet range containing the data
  • SelectFrom (optional): number of values, from the right, to be included.  Blank cells are ignored.
  • DiscardHigh (optional): Number of high values to be excluded from the average
  • DiscardLow (optional): Number of low values to be excluded from the average

If all optional values are omitted the function will return the same as the built in Average() function, except that it returns a column array, with one value for each row of the selected data range.  The function must be entered as an array function, using Ctrl-shift-enter.  See https://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/ for details.

The example spreadsheet (including open source code) can be downloaded from: SelectAv.xlsb

Posted in Excel, Maths, UDFs, VBA | Tagged , , , | 3 Comments

Copy non-blank rows to another sheet

A thread at Eng-Tips asked for a way to display the data from Sheet2 on Sheet1, but only the rows that were not blank in Column A.

There are probably ways to do it without VBA, but it seems to me that the simplest and most efficient way is to write a User Defined Function (UDF) that returns an array with the required data.  The procedure is:

  • Read the data from the specified range on Sheet2 into a VBA variant array.
  • Create a new array of the same size.
  • Copy the data from the imported array into the new array, skipping the blank lines.
  • Write a blank (“”) into any spare rows at the bottom of the new array, so they don’t display as 0.

Here is the code:

Function NonBlanks(DataRange As Variant) As Variant
Dim i As Long, j As Long, NumRows As Long, NumCols As Long, RtnA() As Variant
Dim RtnRow As Long

If TypeName(DataRange) = "Range" Then DataRange = DataRange.Value2
NumRows = UBound(DataRange)
NumCols = UBound(DataRange, 2)

ReDim RtnA(1 To NumRows, 1 To NumCols)
For i = 1 To NumRows
If DataRange(i, 1) <> "" Then
RtnRow = RtnRow + 1
For j = 1 To NumCols
If DataRange(i, j) <> "" Then RtnA(RtnRow, j) = DataRange(i, j) _ 
Else RtnA(RtnRow, j) = ""
Next j
End If
Next i

For i = RtnRow + 1 To NumRows
For j = 1 To NumCols
RtnA(i, j) = ""
Next j
Next i

NonBlanks = RtnA
End Function

You can download a file with the NonBlanks function from: NonBlank.xlsb

And here is what it looks like:

NonBlank function

NonBlank function

See https://newtonexcelbach.wordpress.com/2011/05/10/using-array-formulas/ if you are not sure about using array functions.

Posted in Arrays, Excel, UDFs, VBA | Tagged , , , , | 3 Comments

Just discovered – Data Explorer

Browsing, following up Jeff Weir’s recent comments, led me to a Bacon Bits post about a new add-in from Microsoft called Data Explorer, for importing, transforming and merging data from a wide variety of sources.

Follow the Bacon Bits link for more details and download link, and Calling A Web Service From Data Explorer for another look at it from Chris Webb’s BI Blog.

Posted in Excel | Tagged , | 1 Comment