Sorting with Python and Numpy

Following the previous post, lets look at sorting functions in the Numpy library.

The spreadsheet xlwNumpy links to a variety of Numpy functions, including the sorting functions described below.  The spreadsheet and associated Python code can be downloaded from:

xlNumpy.zip

The Python code requires xlwings and Numpy to be installed, which are included in the (free) Anaconda Python default installation.

The screenshots below  show examples of the sorting functions:

xl_npsort uses the basic Numpy sort function.  The Numpy sort has two significant differences from the standard Python version:

  •  Each column is sorted independently.
  • Numpy arrays convert all elements to the same data type, so if the data contains one or more strings, all the numbers will be converted to strings as well.  This may change the sort results, since in an ascending numerical sort 3 will be sorted before 21, but in an alphabetic sort 21 will come first.

The xl_LexSort function uses Numpy lexsort, which performs an indirect sort, returning a list of index values for the sorted array, using the supplied “keys”.  In the Numpy function the keys are defined as:

keys : (k, N) array or tuple containing k (N,)-shaped sequences.
The k different “columns” to be sorted. The last column (or row if keys is a 2D array) is the primary sort key.

In the Excel version the entire data range is passed as single array, and the columns to be sorted are passed as a row of values.  Note that the primary key is listed first, and the list is base 1, not base 0.  The key list may be any length from 1 up to the number of column.

The xl_Sort_t function sorts a table using the lexsort function, providing similar functionality to the Python sort function, provided that all the data has the same type.

The lexsort results can also be used with the Excel built-in Index function.  This provides the same results as the xl_Sort_t function.  The xl_pysort function calls the standard Python sort, and can therefore handle table with mixed data types.

The xl_Argsort function returns the index values to sort an array in the same way as Numpy sort, i.e. each column is sorted separately.

xl_Msort calls the Numpy msort function, which has similar functionality to sort, except there is no provision for changing the axis of the sort.

xl_Sort_Complex sorts a two column array, treating each row as the real and imaginary parts of a complex number.

xl_Partition and xl_Argpartition split an array into two parts, with values below and above the kth value.

 

This entry was posted in Arrays, Excel, Link to Python, NumPy and SciPy, UDFs, VBA, xlwings and tagged , , , , , , , . Bookmark the permalink.

1 Response to Sorting with Python and Numpy

  1. Pingback: More Numpy with Excel and xlwings | Newton Excel Bach, not (just) an Excel Blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.