A Numpy trap – correction

In my post of 30th May this year (here) I said that:

As a check that the functions were working correctly, the Python functions were modified to return the sum of the largest array in the first row, revealing that the Numpy code was returning the wrong results!  …

It seems that the Numpy arange function uses 32 bit integers, even if the range extends outside the 32 bit range! 

That’s not quite right. In fact the range of Numpy 32 bit integers is -2,147,483,648 to 2,147,483,647 (which is the same as VBA Longs), and the largest value generated by the quoted arange function was only 100,000. The code generating an incorrect result (without generating an error message) was:

@xl_func
def numpysumn(k):
    a = np.arange(k)
    return a.sum()

With a k value of 100,000 the Numpy arange function generates a sequence from 1 to 99,000, so there is no problem generating the array, but the sum of the array members is 4,995,000,000, which exceeds the 32 bit integer limit.

Alternative solutions to this problem are:

  • Declare the arange function as a 64 bit integer:
    a = np.arange(k, dtype=np.int64)
  • Declare k as a 64 bit integer:
    k = np.int64(k)

In both cases the array a will have a 64 bit integer datatype, and a.sum will return the correct result.

This entry was posted in Arrays, Excel, Link to Python, NumPy and SciPy, PyXLL, UDFs and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.