I have a df in which some dates of the time period for each station are missing: How do I fill the missing dates per station and set "Value" to NaN within a multi index df?
The df looks like this:
LST
Date Station_Number
2003-01-01 SWE00137272 -238
2003-01-09 SWE00137272 -172
2003-01-17 SWE00137272 -191
2003-01-25 SWE00137272 -202
2003-02-02 SWE00137272 -297
... ... ...
2020-11-24 GLM00004301 -321
2020-12-02 GLM00004301 -323
2020-12-10 GLM00004301 -347
2020-12-18 GLM00004301 -340
2020-12-26 GLM00004301 -312
[636672 rows x 2 columns]
The time span goes from 01.01.2003 until 31.12.2020. I have tried using:
dates_index = polar_temp.set_index('Date').resample('D').mean().reset_index()['Date'].to_list()
all_possible_dates = pd.DataFrame(product(dates_index, stations), columns=['Date', 'Station_Number'])
date_merge = pd.merge(stations_polar, all_possible_dates, how='outer',on= ['Station_Number','Date'])
However the missing dates will just be appended at the end of the df and even dates that are in both dfs will be appended.
Ideally the added dates would be set to NaN in the LST column. The output should look like this:
LST
Date Station_Number
2003-01-01 SWE00137272 -238
2003-01-02 SWE00137272 NaN
2003-01-03 SWE00137272 NaN
2003-01-04 SWE00137272 NaN
2003-01-05 SWE00137272 NaN
2003-01-06 SWE00137272 NaN
2003-01-07 SWE00137272 NaN
2003-01-08 SWE00137272 NaN
2003-01-09 SWE00137272 -202
2003-01-10 SWE00137272 NaN
-Dots meaning the dates per station continue in a continues time period from 2003 to 2020 per station, added dates are set to NaN.