Pandas Datetime index from range index

Question

I have a collection of transactions with a date and a price column:

+---------------------------+-------+
|           Date            | Price |
+---------------------------+-------+
| 2016-05-27 10:02:24+00:00 |  2.90 |
| 2016-05-27 10:02:24+00:00 | 14.90 |
| 2016-05-29 07:47:09+00:00 | 12.90 |
| 2016-05-29 11:56:32+00:00 | 16.90 |
| 2016-05-29 22:10:08+00:00 | 11.92 |
+---------------------------+-------+

as it is possible to understand from the table not every day a transaction happened, and in some cases several transactions happened the same day.

My question is: how can I create a DataFrame with dates from the oldest transaction to the newest and add to this DataFrame missing dates with price 0, while keepping multiple rows for transaction that happened in the same day? A better example will be in the following table:

+---------------------------+-------+
|           Date            | Price |
+---------------------------+-------+
| 2016-05-27 10:02:24+00:00 |  2.90 |
| 2016-05-27 10:02:24+00:00 | 14.90 |
| 2016-05-28 00:00:00+00:00 |  0.00 |
| 2016-05-29 07:47:09+00:00 | 12.90 |
| 2016-05-29 11:56:32+00:00 | 16.90 |
| 2016-05-29 22:10:08+00:00 | 11.92 |
+---------------------------+-------+

I have tried to create a series with DateRange from the oldest to the newest, and then adding the series to the DataFrame, but doing this leads to having some missing values:

d2 = pd.Series(pd.date_range(min(df.Date), max(df.Date)))

df['dates'] = d2

ALollz · Accepted Answer · 2019-02-21 17:34:42Z

2

You can find which dates are missing, then concatenate the missings back

import pandas as pd

missings = [x for x in pd.date_range(df.Date.min().date(), df.Date.max().date(), freq='1D').date
            if x not in df.Date.dt.date.unique()]

df = (pd.concat([df, pd.DataFrame({'Date': pd.to_datetime(missings).tz_localize('UTC'), 'Price': 0})])
        .sort_values('Date'))

Output:

                       Date  Price
0 2016-05-27 10:02:24+00:00   2.90
1 2016-05-27 10:02:24+00:00  14.90
0 2016-05-28 00:00:00+00:00   0.00
2 2016-05-29 07:47:09+00:00  12.90
3 2016-05-29 11:56:32+00:00  16.90
4 2016-05-29 22:10:08+00:00  11.92

Also possible to find the missing dates with sets, should be a bit faster

missings = list(set(pd.date_range(df.Date.min().date(), df.Date.max().date(), freq='1D', tz='UTC').values) 
                 - set(df.Date.dt.normalize().values))

edited Feb 21, 2019 at 17:34

answered Feb 21, 2019 at 16:43

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Josh Friedlander · Accepted Answer · 2019-02-21 16:51:00Z

You can create a Series with that min-max daterange, outer merge and fillna with 0:

df.Date = pd.to_datetime(df.Date)
rng = pd.date_range(start=df.Date.min(), end=df.Date.max(), freq='D')
df = df.set_index('Date')
pd.merge(df, pd.Series(index=rng, name='rng'), how='outer', left_index=True, right_index=True).drop('rng', 1).fillna(0)

Output:

    Price
2016-05-27 10:02:24     2.900
2016-05-27 10:02:24     14.900
2016-05-28 10:02:24     0.000
2016-05-29 07:47:09     12.900
2016-05-29 10:02:24     0.000
2016-05-29 11:56:32     16.900
2016-05-29 22:10:08     11.920

Note that I ignored the UTC offsets for convenience, I don't think it should affect my solution. Also note that your times for the interpolated days will be the same as your minimum date.

Collectives™ on Stack Overflow

Pandas Datetime index from range index

2 Answers 2

Output:

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Output:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related