I am trying to fill missing dates by user group, however one of my indexed column has a duplicate date, so I tried to use unique date and re-index it then I am getting length mismatch error.How do I resample by day frequency without getting duplicate error.
import pandas as pandas
x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-06','2016-01-06'], 'val': [1,33,2,1,2]})
udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])
dates = x.set_index(udates).resample('D').asfreq().index
users=x['user'].unique()
idx = pandas.MultiIndex.from_product((dates, users), names=['dt', 'user'])
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()
print(x)
Desired output
dt user val
0 2016-01-01 a 1
2 2016-01-02 a 33
4 2016-01-03 a 0
6 2016-01-04 a 0
8 2016-01-05 a 0
10 2016-01-06 a 2
1 2016-01-01 b 0
3 2016-01-02 b 0
5 2016-01-03 b 0
7 2016-01-04 b 0
9 2016-01-05 b 2
11 2016-01-06 b 1