0

I am trying to fill missing dates by ID, however one of my indexed column has a duplicate date, so I tried this code but i keep getting this error "cannot reindex from a duplicate axis"

These are two codes I have tried:

FIRST:

udates=data['day'].unique()
filled_df = (data.set_index('day')
             .groupby('ID')
             .apply(lambda d: d.reindex(pd.date_range(min(data.day),
                                                      max(data.day),
                                                      freq='D')))
             .drop('ID', axis=1)
             .reset_index('ID')
             .fillna(0))
filled_df

SECOND ONE:

users = pd.unique(data.ID)
data.day= pd.to_datetime(data.day)
dates = pd.date_range(min(data.day), max(data.day))
data.set_index('day', inplace=True)

df = pd.DataFrame(index=dates)
for u in users:
    df[u] = data[data.ID==u].val

df = df.unstack().reset_index()
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df

this is an extract of the input data:

day ID val
01/26/2020 AA 100
01/28/2020 AA 200
01/26/2020 BB 100
01/27/2020 BB 100
01/29/2020 BB 40

This the output I want to get:

day ID val
01/26/2020 AA 100
01/27/2020 AA 0
01/28/2020 AA 200
01/26/2020 BB 100
01/27/2020 BB 100
01/28/2020 BB 0
01/29/2020 BB 40

Thank you

EDIT


df = pd.DataFrame({'ID': ['AA', 'AA', 'BB', 'BB','BB'], 
                   'day': ['1/26/2020', '1/28/2020', '1/26/2020', '1/27/2020','1/29/2020'], 'val': [100, 200, 100, 100, 40]})

3
  • Can you include the sample data that would be used to create your desired output? See: How to make good reproducible pandas examples for tips Commented Apr 28, 2021 at 20:56
  • thanks, I edit the question Commented Apr 28, 2021 at 21:05
  • 1
    df.set_index('day').groupby("ID").apply(lambda d: d.reindex(pd.date_range(d.index.min(),d.index.max(), freq='D'))) works fine for your code. Kindly share an example data with duplicates within a group Commented Apr 28, 2021 at 21:16

1 Answer 1

1
>>> df.set_index("day") \
      .groupby("ID")["val"] \
      .resample("D") \
      .first() \
      .fillna(0) \
      .reset_index()

   ID        day    val
0  AA 2020-01-26  100.0
1  AA 2020-01-27    0.0
2  AA 2020-01-28  200.0
3  BB 2020-01-26  100.0
4  BB 2020-01-27  100.0
5  BB 2020-01-28    0.0
6  BB 2020-01-29   40.0

Note: the function first() is useless. It's because Resampler.fillna() only works with the method keyword. You cannot pass a value unlike DataFrame.fillna().

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.