How do I add missing dates to multi index df?

Question

I have a df in which some dates of the time period for each station are missing: How do I fill the missing dates per station and set "Value" to NaN within a multi index df?

The df looks like this:


                           LST
Date        Station_Number                
2003-01-01    SWE00137272 -238
2003-01-09    SWE00137272 -172
2003-01-17    SWE00137272 -191
2003-01-25    SWE00137272 -202
2003-02-02    SWE00137272 -297
...                   ...  ...
2020-11-24    GLM00004301 -321
2020-12-02    GLM00004301 -323
2020-12-10    GLM00004301 -347
2020-12-18    GLM00004301 -340
2020-12-26    GLM00004301 -312

[636672 rows x 2 columns]

The time span goes from 01.01.2003 until 31.12.2020. I have tried using:

dates_index = polar_temp.set_index('Date').resample('D').mean().reset_index()['Date'].to_list()
all_possible_dates = pd.DataFrame(product(dates_index, stations), columns=['Date', 'Station_Number'])

date_merge = pd.merge(stations_polar, all_possible_dates, how='outer',on= ['Station_Number','Date'])

However the missing dates will just be appended at the end of the df and even dates that are in both dfs will be appended.

Ideally the added dates would be set to NaN in the LST column. The output should look like this:

                            LST
Date        Station_Number                 
2003-01-01    SWE00137272 -238
2003-01-02    SWE00137272 NaN
2003-01-03    SWE00137272 NaN
2003-01-04    SWE00137272 NaN
2003-01-05    SWE00137272 NaN
2003-01-06    SWE00137272 NaN
2003-01-07    SWE00137272 NaN
2003-01-08    SWE00137272 NaN
2003-01-09    SWE00137272 -202
2003-01-10    SWE00137272 NaN

-Dots meaning the dates per station continue in a continues time period from 2003 to 2020 per station, added dates are set to NaN.

Nk03 · Accepted Answer · 2021-06-10 10:33:44Z

1

Take Station_Number out from the index.
Convert date index to datetime (If required).
resample and then ffill the Station_Number

df1 = df.reset_index(-1)
df1.index = pd.to_datetime(df1.index)
df1 = df1.resample('D').first().assign(Station_Number = lambda x: x['Station_Number'].ffill())

answered Jun 10, 2021 at 10:33

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tillwss Over a year ago

Nice it worked, but only for one station, is there a way to iterate that function over the other stations as well?

Collectives™ on Stack Overflow

How do I add missing dates to multi index df?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related