Extracting selected data with array functions.

This post is in response to a query at the Eng-tips forum asking how to extract selected data from a range, so that the three best scores for each individual could be extracted from a two column Excel table listing multiple individuals, each with several different scores.

My response was based on an article by Chip Pearson at: http://www.cpearson.com/excel/ArrayFormulas.aspx

If the names are in B6:B105 the scores are in C6:C105 and a selected name is in E6 then:
(IF($B$6:$B$105=$E6,$C$6:$C$105,FALSE)
will return an array of 100 items containing either a score for the selected name or FALSE.

You can then wrap the Large() function around that:
=LARGE(IF($B$6:$B$105=$E6,$C$6:$C$105,FALSE),F$5)
where F5 contains the rank you want.  An example is shown below, listing the three best scores of Thomas, Richard and Harold:

Array Function Example

Note that because this is an array formula it must be entered by pressing “Ctrl-Shift-Enter”. If you just press “Enter” it returns 0.

The working of the formula is shown below, with the results of the formula:
(IF($B$6:$B$105=$E6,$C$6:$C$105,FALSE)

Array of scores for Thomas

Note that the array includes one entry for every row of the original table, containing either a score for Thomas, or FALSE if the row is data for another person.  This is not a problem for use with the Large() function, but if a continuous list of scores for reach individual is required then further work is required.

In many cases simply sorting the table, using the Excel filter functions, or using a pivot table would be the easiest way to go, but if an automatically updating array formula is required it is possible to wrap some additional functions around the solution presented above, such as that shown at The Get Digital Help blog (modified version shown below):

=IF($E$5=$B$5:$B$104,ROW($C$5:$C$104)-ROW($C$5)+1,FALSE)
will return an array of row numbers for rows containing scores for Thomas, with FALSE in the other rows.

=SMALL(IF($E$5=$B$5:$B$104,ROW($C$5:$C$104)-ROW($C$5)+1,FALSE),ROW(B1:B99))
Extracts the row numbers for Thomas from this array, without the FALSE values, and:

=INDEX(C5:C104,SMALL(IF($E$5=$B$5:$B$104,ROW($C$5:$C$104)-ROW($C$5)+1,FALSE),ROW(B1:B99)))
extracts the actual score from those row numbers.

In this case the first approach is not only much shorter, it is also much more convenient because before the Large() function can be used on the array returned by the second method the length of the array needs to be determined.

Once again, remember all the formulas presented here must be entered as array formulas.  See Using Array Formulas for more details.  Failure to enter as an array will sometimes result in an error message, but can return incorrect values, so beware.

Edit 19th June 12:  The comment from oli below is worth a closer look.  The basic technique is to replace the IF statement with an array multiplication:

($B$5:$B$104=E5) returns an array of TRUE and FALSE values.  When these are multiplied by the values in C5:C104 it returns an array of the values in column C or 0 (since TRUE and FALSE are treated as equal to 1 and 0 in numerical calculations), and then the Large function may be used directly on that array.  This is a very powerful technique, with many applications, the only proviso being that it is not suitable where the data may include 0 values, since it will be impossible to differentiate between this value and the values that do not match the selection criteria.

The other difference is that the Large function has also been applied as an array formula, so that rather than array entering the formula in a single cell and copying it to the two adjacent cells it must be array entered with the three adjacent cells selected.  It will then return the top three scores for the chosen person.

This entry was posted in Arrays, Excel and tagged , . Bookmark the permalink.

5 Responses to Extracting selected data with array functions.

  1. Gary K says:

    I had done something similar for a golf score spreadsheet a few years back. my brother needed a running average of the lowest 3 scores from the previous 5 weeks.
    this is the array formula I came up with. the scores were in columns B-T:
    =AVERAGE(SMALL(IF(OFFSET(B9:T9,0,MATCH(1E+300,B9:T9)-5,1,5)=0,MAX(B9:T9),OFFSET(B9:T9,0,MATCH(1E+300,B9:T9)-5,1,5)),{1,2,3}))

    Liked by 1 person

  2. The array formula is much more convenient than using the INDEX formula.
    But what if Thomas only has two scores? How do you leave his third score blank?

    Like

  3. Oli says:

    Does this meet your criteria? =LARGE((($B$4:$B$38=E4)*$C$4:$C$38),{1,2,3})

    Like

  4. dougaj4 says:

    Gary – I suspect that there may be a simpler way, but if that does the trick, that’s all good. My only comment is that I hope the number in the match functions is not a comment on your brother’s golf playing abilities 🙂

    Catia – If there are only 2 scores for any individual the third score is returned as a #NUM! error. That seems OK to me, but if you prefer a zero you could replace the FALSE in the formula with 0, or use Oli’s alternative. If you must have a blank the easiest way would be to set up a second table for display purposes with a simple IF function to return a blank for any cells with a #NUM! or 0 display.

    Oli – very neat. That seems to give the same results, other than returning a 0 rather than a #NUM! if there are fewer than the required number of results. This is a very useful technique, and I’ll add a comment about it to the main post.

    Like

  5. Thank You ! Very much It was very useful.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.