xlwings – dataframes and statistics

Another data conversion option offered by the latest xlwings release is Pandas dataframes and dataseries. This post will look at how to pass an Excel range as a dataframe or dataseries in a User Defined Function (UDF), and some simple statistical applications, but this is barely scratching the surface of the potential of the Pandas Python Data Analysis Library.

To use the Pandas data structures, import pandas and xlwings; the data type can then be defined with an xw.arg decorator:

import pandas as pd
import xlwings as xw 

@xw.arg('datrange', pd.DataFrame, index = 0, header = 0)    
def rtnpdframe(datrange):
    return datrange
    
@xw.arg('datrange', pd.Series, header = 0)    
def rtnpdseries(datrange):
    return datrange

The Python functions can then be called from VBA:

Function GetDFrame(DRange As Variant)
On Error GoTo rtnerr:
    GetDFrame = Py.CallUDF(ModName, "rtnpdframe", Array(DRange), ThisWorkbook)
    Exit Function
rtnerr:
    GetDFrame = Err.Description
End Function

Function GetDSeries(DRange As Variant)
On Error GoTo rtnerr:
    GetDSeries = Py.CallUDF(ModName, "rtnpdseries", Array(DRange), ThisWorkbook)
    Exit Function
rtnerr:
    GetDSeries = Err.Description
End Function

Use of these functions is shown in the screenshots below:

The xl_Corr UDF shown above is taken from the xlwings documentation:

@xw.func
@xw.arg('x', pd.DataFrame, index=False, header=False)
@xw.ret(index=False, header=False)    
def xl_Correl(x):
    return x.corr()

I have added a simple extension to allow an Excel UDF to call any of the dataframe methods:

@xw.func
@xw.arg('x', pd.DataFrame, index=False, header=False)
@xw.ret(index=False, header=False)    
def xl_Stats(x, stat):
    method = getattr(x, stat)
    return method()

Use of this UDF with four different statistics functions is shown below. In these examples the statistic is a single value, and the results for each of the three columns of the input range are returned as a single column with three rows.

The “describe” method returns 8 values for each column in the input range:

The Pandas documentation lists 20 common statistics functions available as dataframe methods. The screenshot below shows the use of the xl_Stats UDF, in conjunction with the Index function to call any of these 20 functions. In the example shown the cumsum function returns one row for each row of the input data (only the first three are shown):

The functions shown above have been added to:
xlwDict.zip
available for free download, with full open source code.

4 Responses to xlwings – dataframes and statistics

Pingback: xlwSciPy 1.7 | Newton Excel Bach, not (just) an Excel Blog
skip says:

14/12/2022 at 6:15 am

Hello, thank you very much. Its very helpful. I have done this cumsum UDF.

import pandas as pd
@xw.func
@xw.arg(‘x’, pd.DataFrame, index=False, header=False)
@xw.ret(index=False, header=False)
def xl_Correl_1(x):
return x.cumsum()

Do you think that there is any chance to modify this function to make calculation with numbers with many decimal places, using mpmath, or decimal?
Suppose I will need to do cumsum on numbers like this:

1.41421356237309504880168872421
1.41421356237309504880168872421
2.82842712474619009760337744842
4.24264068711928514640506617263

LikeLike

- dougaj4 says:
  
  15/12/2022 at 11:56 am
  
  As with your other query, you would have to enter the numbers in Excel as text strings, and transfer them to Python as text, then convert them to a suitable precision number format for the calculations, then back to a string for transfer back to Excel.
  I am not sure if Pandas dataframes have any restrictions on the number format. I’d suggest checking that before spending too much time on it.
  
  LikeLike
  
  - Skip says:
    
    15/12/2022 at 4:19 pm
    
    I see, this one works in python
    
    # Import pandas package
    import pandas as pd
    
    # import decimal library
    from decimal import *
    
    # set the number of decima places
    getcontext().prec = 30
    
    # create df from dictionary
    df = pd.DataFrame({‘num’: [2, 2, 2, 2, 2, 2, 2, 2]})
    
    #apply decimals on one column
    df[‘num’] = df[‘num’].apply(Decimal)
    
    # print results
    print(df.cumsum().ffill())
    
    LikeLike

	Scipy-statistics wit… on Scipy-statistics update
	dougaj4 on Spherical Geometry and Vincent…
	Pat Kirol on Spherical Geometry and Vincent…
	… and then py_… on py_xlCBA 0.04
	Alternative iterativ… on Newton-Raphson and Brent…
	Alternative iterativ… on Scipy functions with Excel and…
	Z on Downloads
	py_xlCBA – Sup… on py_xlCBA update
	dougaj4 on Downloads
	Z on Downloads
	py_xlCBA update \| Ne… on Calling PyCBA from Excel
	Z on Reinforced concrete elastic an…
	dougaj4 on Reinforced concrete elastic an…
	khoitsma on Continuous beam animations wit…
	Z on Reinforced concrete elastic an…