Filling missing dates within group with duplicate date pandas python

Question

I am trying to fill missing dates by ID, however one of my indexed column has a duplicate date, so I tried this code but i keep getting this error "cannot reindex from a duplicate axis"

These are two codes I have tried:

FIRST:

udates=data['day'].unique()
filled_df = (data.set_index('day')
             .groupby('ID')
             .apply(lambda d: d.reindex(pd.date_range(min(data.day),
                                                      max(data.day),
                                                      freq='D')))
             .drop('ID', axis=1)
             .reset_index('ID')
             .fillna(0))
filled_df

SECOND ONE:

users = pd.unique(data.ID)
data.day= pd.to_datetime(data.day)
dates = pd.date_range(min(data.day), max(data.day))
data.set_index('day', inplace=True)

df = pd.DataFrame(index=dates)
for u in users:
    df[u] = data[data.ID==u].val

df = df.unstack().reset_index()
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df

this is an extract of the input data:

day	ID	val
01/26/2020	AA	100
01/28/2020	AA	200
01/26/2020	BB	100
01/27/2020	BB	100
01/29/2020	BB	40

This the output I want to get:

day	ID	val
01/26/2020	AA	100
01/27/2020	AA	0
01/28/2020	AA	200
01/26/2020	BB	100
01/27/2020	BB	100
01/28/2020	BB	0
01/29/2020	BB	40

Thank you

EDIT


df = pd.DataFrame({'ID': ['AA', 'AA', 'BB', 'BB','BB'], 
                   'day': ['1/26/2020', '1/28/2020', '1/26/2020', '1/27/2020','1/29/2020'], 'val': [100, 200, 100, 100, 40]})

Can you include the sample data that would be used to create your desired output? See: How to make good reproducible pandas examples for tips — Henry Ecker
– Henry Ecker ♦, Commented Apr 28, 2021 at 20:56
df.set_index('day').groupby("ID").apply(lambda d: d.reindex(pd.date_range(d.index.min(),d.index.max(), freq='D'))) works fine for your code. Kindly share an example data with duplicates within a group — sammywemmy
– sammywemmy, Commented Apr 28, 2021 at 21:16

Corralien · Accepted Answer · 2021-04-28 21:30:31Z

1

>>> df.set_index("day") \
      .groupby("ID")["val"] \
      .resample("D") \
      .first() \
      .fillna(0) \
      .reset_index()

   ID        day    val
0  AA 2020-01-26  100.0
1  AA 2020-01-27    0.0
2  AA 2020-01-28  200.0
3  BB 2020-01-26  100.0
4  BB 2020-01-27  100.0
5  BB 2020-01-28    0.0
6  BB 2020-01-29   40.0

Note: the function first() is useless. It's because Resampler.fillna() only works with the method keyword. You cannot pass a value unlike DataFrame.fillna().

edited Apr 28, 2021 at 21:30

answered Apr 28, 2021 at 21:21

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Filling missing dates within group with duplicate date pandas python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related