Add missing dates into TimeSeries DataFrame with 'NaN' values for existing columns

Question

I have a DataFrame object representing a time series indexed in minute resolution. There are segments in which there are no entries, which have been identified as follows:

missing = pd.date_range(df.index.min(), df.index.max(), freq='1min').difference(df.index)

Doing a quick print of this results in the following output:

DatetimeIndex(['2021-02-11 03:41:00', '2021-02-11 03:42:00',
           '2021-02-11 03:43:00', '2021-02-11 03:44:00',
           '2021-02-11 03:45:00', '2021-02-11 03:46:00',
           '2021-02-11 03:47:00', '2021-02-11 03:48:00',
           '2021-02-11 03:49:00', '2021-02-11 03:50:00',
           ...
           '2021-09-29 08:50:00', '2021-09-29 08:51:00',
           '2021-09-29 08:52:00', '2021-09-29 08:53:00',
           '2021-09-29 08:54:00', '2021-09-29 08:55:00',
           '2021-09-29 08:56:00', '2021-09-29 08:57:00',
           '2021-09-29 08:58:00', '2021-09-29 08:59:00'],
          dtype='datetime64[ns]', length=5057, freq=None)

My question is how to insert each of these values as a new row into the initial DataFrame object with NaN values for all columns existing in that initial dataframe as such:

date	a	b	c	d
2021-12-31 23:59:00	3675.10	3678.63	3675.09	3676.23
2021-12-31 23:58:00	Nan	Nan	Nan	Nan
2021-12-31 23:57:00	Nan	Nan	Nan	Nan
2021-12-31 23:56:00	3676.79	3678.54	3676.32	3676.94
2021-12-31 23:55:00	3679.36	3679.36	3676.28	3676.65

Where, in this contrived example, the dates 2021-12-31 23:58:00 and 2021-12-31 23:57:00 would have been identified via the date_range function before and then inserted with NaN values into the initial DataFrame. In this case, I am not interested in interpolating between surrounding values.

@Umar.H The input DataFrame is essentially the same as that in the last code block, just sans the rows with NaN values. — alphazwest
– alphazwest, Commented Jun 24, 2022 at 14:00
@programmar I am only vaguely familiar with the resample function -- I use it to turn n-periods into n/m periods often but am not fluent enough to know how to apply it in this case. Suggestions? — alphazwest
– alphazwest, Commented Jun 24, 2022 at 14:02
please have a chance to read minimal reproducible example and How to Ask you need to provide a representative sample of your input and output dataframes. — Umar.H
– Umar.H, Commented Jun 24, 2022 at 14:04

programmar · Accepted Answer · 2022-06-24 14:19:26Z

3

Use resample!

df.resample('1T').fillna(method=None)

Note: '1T' stands for 1 minute

And here some code that works, recreating the data frame from your sample data. But please next time, provide this yourself. You will get better and faster answers!

import pandas
from io import StringIO


s = """date  a  b  c  d
2021-12-31 23:59:00  3675.10  3678.63  3675.09  3676.23
2021-12-31 23:56:00  3676.79  3678.54  3676.32  3676.94
2021-12-31 23:55:00  3679.36  3679.36  3676.28  3676.65"""


csvStringIO = StringIO(s)

df = pandas.read_csv(csvStringIO,  sep='  ', index_col='date')
df.index = pandas.to_datetime(df.index)

df = df.sort_index()

df.resample('1T').fillna(method=None)

Output:

                           a        b        c        d
date                                                   
2021-12-31 23:55:00  3679.36  3679.36  3676.28  3676.65
2021-12-31 23:56:00  3676.79  3678.54  3676.32  3676.94
2021-12-31 23:57:00      NaN      NaN      NaN      NaN
2021-12-31 23:58:00      NaN      NaN      NaN      NaN
2021-12-31 23:59:00  3675.10  3678.63  3675.09  3676.23

Edit: Your index is unsorted. I had to add the line with index_sort(). Keep that in mind.

edited Jun 24, 2022 at 14:19

answered Jun 24, 2022 at 14:10

programmar

6547 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

alphazwest Over a year ago

This works great! The only issue is that the resulting data is in reversed order -- any idea how to avoid that? It's no issue to simply use .iloc[::-1] on the resulting DataFrame, I'm mostly just curious.

programmar Over a year ago

I just fixed that. How is it now?

Collectives™ on Stack Overflow

Add missing dates into TimeSeries DataFrame with 'NaN' values for existing columns

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related