I have a DataFrame object representing a time series indexed in minute resolution. There are segments in which there are no entries, which have been identified as follows:
missing = pd.date_range(df.index.min(), df.index.max(), freq='1min').difference(df.index)
Doing a quick print of this results in the following output:
DatetimeIndex(['2021-02-11 03:41:00', '2021-02-11 03:42:00',
'2021-02-11 03:43:00', '2021-02-11 03:44:00',
'2021-02-11 03:45:00', '2021-02-11 03:46:00',
'2021-02-11 03:47:00', '2021-02-11 03:48:00',
'2021-02-11 03:49:00', '2021-02-11 03:50:00',
...
'2021-09-29 08:50:00', '2021-09-29 08:51:00',
'2021-09-29 08:52:00', '2021-09-29 08:53:00',
'2021-09-29 08:54:00', '2021-09-29 08:55:00',
'2021-09-29 08:56:00', '2021-09-29 08:57:00',
'2021-09-29 08:58:00', '2021-09-29 08:59:00'],
dtype='datetime64[ns]', length=5057, freq=None)
My question is how to insert each of these values as a new row into the initial DataFrame object with NaN values for all columns existing in that initial dataframe as such:
| date | a | b | c | d |
|---|---|---|---|---|
| 2021-12-31 23:59:00 | 3675.10 | 3678.63 | 3675.09 | 3676.23 |
| 2021-12-31 23:58:00 | Nan | Nan | Nan | Nan |
| 2021-12-31 23:57:00 | Nan | Nan | Nan | Nan |
| 2021-12-31 23:56:00 | 3676.79 | 3678.54 | 3676.32 | 3676.94 |
| 2021-12-31 23:55:00 | 3679.36 | 3679.36 | 3676.28 | 3676.65 |
Where, in this contrived example, the dates 2021-12-31 23:58:00 and 2021-12-31 23:57:00 would have been identified via the date_range function before and then inserted with NaN values into the initial DataFrame. In this case, I am not interested in interpolating between surrounding values.
NaNvalues.resamplefunction -- I use it to turn n-periods into n/m periods often but am not fluent enough to know how to apply it in this case. Suggestions?