python pandas rolling function with two arguments in a grouped DataFrame

Question

This is a somewhat extension to my previous problem python pandas rolling function with two arguments .

How do I perform the same by group? Let's say that the 'C' column below is used for grouping.

I am struggling to:

Group by column 'C'
Within each group, sort by 'A'
Withing each group, apply a rolling function taking two arguments, like kendalltau, to arguments 'A' and 'B'.

The expected result would be a DataFrame like the one below:

I have been trying the 'pass an index' workaround as described in the link above, but the complexity of this case is beyond my skills :-( . This is a toy example, not that far from what I am working with, so for simplicity i used randomly generated data.

rand = np.random.RandomState(1)
dff = pd.DataFrame({'A' : np.arange(20),
                    'B' : rand.randint(100, 120, 20),
                    'C' : rand.randint(0, 2, 20)})

def my_tau_indx(indx):
    x = dff.iloc[indx, 0]
    y = dff.iloc[indx, 1]
    tau = sp.stats.mstats.kendalltau(x, y)[0]
    return tau

dff['tau'] = dff.sort_values(['C', 'A']).groupby('C').rolling(window = 5).apply(my_tau_indx, args = ([dff.index.values]))

Every fix I make creates yet another bug...

The Above issue has been solved by Nickil Maveli and it works with numpy 1.11.0, pandas 0.18.1, scipy 0.17.1, andwith conda 4.1.4. It generates some warnings, but works.

On my another machine with latest & greatest numpy 1.12.0, pandas 0.19.2, scipy 0.18.1, conda version 3.10.0 and BLAS/LAPACK - it does not work and I get the traceback below. This seems versions related since I upgraded the 1st machine it also stopped working... In the name of science... ;-)

As Nickil suggested, this was due to incompatibility between numpy 1.11 and 1.12. Downgrading numpy helped. Since I had had BLAS/LAPACK on a Windows, I installed numpy 1.11.3+mkl from http://www.lfd.uci.edu/~gohlke/pythonlibs/ .

Traceback (most recent call last):

File "<ipython-input-4-bbca2c0e986b>", line 16, in <module>
t = grp.apply(func)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 651, in apply
return self._python_apply_general(f)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 655, in _python_apply_general
self.axis)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 1527, in apply
res = f(group)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 647, in f
return func(g, *args, **kwargs)

File "<ipython-input-4-bbca2c0e986b>", line 15, in <lambda>
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply
kwargs=kwargs)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat
result = getattr(r, name)(*args, **kwds)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 863, in apply
return super(Rolling, self).apply(func, args=args, kwargs=kwargs)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 621, in apply
center=False)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 560, in _apply
result = calc(values)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 555, in calc
return func(x, window, min_periods=self.min_periods)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 618, in f
kwargs)

File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51768)

File "<ipython-input-4-bbca2c0e986b>", line 8, in my_tau_indx
x = dff.iloc[indx, 0]

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1294, in __getitem__
return self._getitem_tuple(key)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1560, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1614, in _getitem_axis
return self._get_loc(key, axis=axis)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 96, in _get_loc
return self.obj._ixs(key, axis=axis)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\frame.py", line 1908, in _ixs
label = self.index[i]

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\range.py", line 510, in __getitem__
return super_getitem(key)

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\base.py", line 1275, in __getitem__
result = getitem(key)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

The final check:

@Andrew L - thank you, I wrongly assumed this can be inferred. I hope it is clearer now. — rpl
– rpl, Commented Jan 18, 2017 at 10:46

Nickil Maveli · Accepted Answer · 2017-01-21 08:03:35Z

1

One way to achieve would be to iterate through every group and use pd.rolling_apply on every such groups.

import scipy.stats as ss

def my_tau_indx(indx):
    x = dff.iloc[indx, 0]
    y = dff.iloc[indx, 1]
    tau = ss.mstats.kendalltau(x, y)[0]
    return tau

grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False)
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index)
t = grp.apply(func)
dff.reindex(t.index).assign(tau=t)

EDIT:

def my_tau_indx(indx):
    x = dff.ix[indx, 0]
    y = dff.ix[indx, 1]
    tau = ss.mstats.kendalltau(x, y)[0]
    return tau

grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False)
t = grp.rolling(5).apply(my_tau_indx).get('A')

grp.head(dff.shape[0]).reindex(t.index).assign(tau=t)

edited Jan 21, 2017 at 8:03

answered Jan 18, 2017 at 12:11

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

rpl Over a year ago

thank you for posting a solution. Since pd.rolling_apply is going to be deprecated, I wonder if there is a way to achieve the same with rolling? Also, (this is more my curiosity) does a solution which does not rely on a function modifying a global variables come to your mind?

Nickil Maveli Over a year ago

I don't think DF.rolling().apply() is capable of returning a scalar value from a custom function in it's current form. An alternative would be to redesign this using a sliding window list comprehension and then concatenate various such computations row-wise which seems like too much of an effort. It's better that you stick with pd.rolling_apply() for now and wait till an improved version rolls out in the coming future or post an issue on github addressing this concern

Nickil Maveli Over a year ago

The IndexError is due to the incompatibility b/w 0.11 and 0.12 versions of numpy. I'll try to find a fix if it's possible, else you can downgrade just your numpy for now and it would work.

rpl Over a year ago

Thank you for poining to the exact module! I downgraded and it runs fine! If only I could, I would upvote you 100 times! Thank you again!

rpl Over a year ago

Since I am porting this code from R I'll use it to check the calculations. There may be some differences due to numerical issues. For example: I remember that I saw some differences between R and JMP when I was developing R code using JMP results as the guide.

|

Collectives™ on Stack Overflow

python pandas rolling function with two arguments in a grouped DataFrame

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related