Pandas filling missing dates and values within group with duplicate index values

Question

I am trying to fill missing dates by user group, however one of my indexed column has a duplicate date, so I tried to use unique date and re-index it then I am getting length mismatch error.How do I resample by day frequency without getting duplicate error.

import pandas as pandas

x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-06','2016-01-06'], 'val': [1,33,2,1,2]})
udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])
dates = x.set_index(udates).resample('D').asfreq().index
users=x['user'].unique()
idx = pandas.MultiIndex.from_product((dates, users), names=['dt', 'user'])
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()
print(x)

Desired output

          dt user  val
0  2016-01-01    a    1
2  2016-01-02    a   33
4  2016-01-03    a    0
6  2016-01-04    a    0
8  2016-01-05    a    0
10 2016-01-06    a    2
1  2016-01-01    b    0
3  2016-01-02    b    0
5  2016-01-03    b    0
7  2016-01-04    b    0
9  2016-01-05    b    2
11 2016-01-06    b    1

sacuL · Accepted Answer · 2018-05-13 21:32:53Z

5

Here is one way, reindexing each user to have a date range from your minimum date to your maximum date:

# setup your dataframe as you had it before:
x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-06','2016-01-06'], 'val': [1,33,2,1,2]})
udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])

# fill with new dates:
filled_df = (x.set_index('dt')
             .groupby('user')
             .apply(lambda d: d.reindex(pd.date_range(min(x.dt),
                                                      max(x.dt),
                                                      freq='D')))
             .drop('user', axis=1)
             .reset_index('user')
             .fillna(0))


>>> filled_df
           user   val
2016-01-01    a   1.0
2016-01-02    a  33.0
2016-01-03    a   0.0
2016-01-04    a   0.0
2016-01-05    a   0.0
2016-01-06    a   2.0
2016-01-01    b   0.0
2016-01-02    b   0.0
2016-01-03    b   0.0
2016-01-04    b   0.0
2016-01-05    b   2.0
2016-01-06    b   1.0

answered May 13, 2018 at 21:32

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Masterbuilder Over a year ago

Thanks, it works, what is the significance of reset_index for user alone ?

sacuL Over a year ago

You're welcome! You can actually reset the index for both levels of the index, I was just for some reason keeping your dt column as the index in your final dataframe, but it's actually unnecessary. Only thing is, then you need to rename the resulting column (but that's not a very big problem)

b2002 · Accepted Answer · 2018-05-13 23:42:22Z

Another way less elegant than @sacul...but almost same speed.

import pandas as pd
x = pd.DataFrame({'user': ['a','a','b','b','a'],
                  'dt': ['2016-01-01','2016-01-02',
                         '2016-01-05','2016-01-06','2016-01-06'],
                  'val': [1,33,2,1,2]})

users = pd.unique(x.user)
x.dt = pd.to_datetime(x.dt)
dates = pd.date_range(min(x.dt), max(x.dt))
x.set_index('dt', inplace=True)

df = pd.DataFrame(index=dates)
for u in users:
    df[u] = x[x.user==u].val

df = df.unstack().reset_index()
df.rename(columns={'level_0': 'user',
                    'level_1': 'dt',
                    0: 'val'}, inplace=True)
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df = df[['dt', 'user', 'val']]

df:

            dt user  val
0   2016-01-01    a    1
1   2016-01-02    a   33
2   2016-01-03    a    0
3   2016-01-04    a    0
4   2016-01-05    a    0
5   2016-01-06    a    2
6   2016-01-01    b    0
7   2016-01-02    b    0
8   2016-01-03    b    0
9   2016-01-04    b    0
10  2016-01-05    b    2
11  2016-01-06    b    1

Collectives™ on Stack Overflow

Pandas filling missing dates and values within group with duplicate index values

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related