Pandas DataFrame MultiIndex groupby rolling operation with missing dates

Question

I have a dataframe which has a MultiIndex where the last column of the index is a date. I am trying to make a rolling operation on the columns with a specific frequency. As I understand it, the usual pandas approach if I had a TimeIndex would be to call the rolling function with a string of the frequency (for example '2D' if I wanted the window to be two days). Yet another approach suggested is to resample the TimeIndex and then apply rolling function with integer 2. Essentially what I want to be able to do is group by all the columns except for the last one and then tell the rolling column to use the last column for timedelta-specific rolling. Below is an example to demonstrate this:

from datetime import datetime
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([
    ("A", datetime(2017, 1, 1)), 
    ("A", datetime(2017, 1, 2)), 
    ("A", datetime(2017, 1, 3)), 
    ("A", datetime(2017, 1, 4)),
    ("B", datetime(2017, 1, 1)),
    ("B", datetime(2017, 1, 3)),
    ("B", datetime(2017, 1, 4))])
df = pd.DataFrame(index=multi_index, data={"colA": [1, 1, 1, 1, 1, 1, 1]})
display(df)
df.groupby([df.index.get_level_values(0), pd.Grouper(freq="1D", level=-1)]).sum().rolling(2).sum

The above code does not create a row for (B, datetime(2017, 1, 2)) and so the rolling sums will be all two.

One ugly way to get around this, which really only works if there is a group which has all the days is to unstack, fillna and stack before rolling:

df.groupby([df.index.get_level_values(0), pd.Grouper(freq="1D", level=-1)]
).sum().unstack().fillna(0).stack().rolling(2).sum()

Needless to say this is an ugly hack, slow and error-prone. Is there a nice way achieve what I need here without extensive manipulation? Ideally some way to tell the grouper to take the timestamp column or fill missing values itself?

jezrael · Accepted Answer · 2017-02-09 06:43:03Z

6

You can use groupby + resample + fillna - need version pandas 0.19.0:

multi_index = pd.MultiIndex.from_tuples([
    ("A", datetime(2017, 1, 1)), 
    ("A", datetime(2017, 1, 2)), 
    ("A", datetime(2017, 1, 3)), 
    ("A", datetime(2017, 1, 4)),
    ("B", datetime(2017, 1, 1)),
    ("B", datetime(2017, 1, 3)),
    ("B", datetime(2017, 1, 4))])
df = pd.DataFrame(index=multi_index, data={"colA": [1, 2, 3, 4, 1, 2, 3]})
print (df)
              colA
A 2017-01-01     1
  2017-01-02     2
  2017-01-03     3
  2017-01-04     4
B 2017-01-01     1
  2017-01-03     2
  2017-01-04     3

b = df.groupby(level=0).resample('1D', level=1).sum().fillna(0).rolling(2).sum()
print (b)
              colA
A 2017-01-01   NaN
  2017-01-02   3.0
  2017-01-03   5.0
  2017-01-04   7.0
B 2017-01-01   5.0
  2017-01-02   1.0
  2017-01-03   2.0
  2017-01-04   5.0

answered Feb 9, 2017 at 6:43

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sam Cohan Over a year ago

Awesome answer, however I want the first B to be Nan (as it is a new group)... Using you code I was able to do that: `` df.groupby(level=0).resample('1D', level=1).sum().fillna(0).groupby(level=0).apply(lambda x: x.rolling(2).sum()) ``

jezrael Over a year ago

But if use

df.groupby([df.index.get_level_values(0), pd.Grouper(freq="1D", level=-1)] ).sum().unstack().fillna(0).stack().rolling(2).sum()

then get same output.

Sam Cohan Over a year ago

You are right. Your answer perfectly did what I was asking. I upvoted obviously as it was very helpful (I am a noob here so my upvotes do not count apparently!) Thanks so much for your help!

SanMu Over a year ago

I have a similar issue, just one slight modification: How can I get the rolling sum to 'reset', i.e. the first value in colA for 'B' (in frame b) should be NaN as opposed to 5? Essentially, calculate the rolling sum across the dates, for each item in level 0 of the index separately without overlap

Collectives™ on Stack Overflow

Pandas DataFrame MultiIndex groupby rolling operation with missing dates

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related