3

I have a time series dataset of product given below:

date    product price   amount
11/17/2019  A   10  20
11/19/2019  A   15  20
11/24/2019  A   20  30
12/01/2019  C   40  50
12/05/2019  C   45  35

This data has a missing days ("MM/dd/YYYY") between the start and end date of data for each product. I am trying to fill missing date with zero rows and convert to previous table into a table given below:

date    product price   amount
11/17/2019  A   10  20
11/18/2019  A   0   0
11/19/2019  A   15  20
11/20/2019  A   0   0
11/21/2019  A   0   0
11/22/2019  A   0   0
11/23/2019  A   0   0
11/24/2019  A   20  30
12/01/2019  C   40  50
12/02/2019  C   0   0
12/03/2019  C   0   0
12/04/2019  C   0   0
12/05/2019  C   45  35

To get this conversion, I used the code:

import pandas as pd
import numpy as np
data=pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
data=data.set_index(["date", "product"])
start=data.first_valid_index()[0]
end=data.last_valid_index()[0]
df=data.set_index("date").reindex(pd.date_range(start,end, freq="1D"), fill_values=0)

However the code gives an error. Is there any way to get this conversion efficiently?

3 Answers 3

4

If need add 0 for missing Datetimes for each product separately use custom function in GroupBy.apply with DataFrame.reindex by minimal and maximal datetime:

df = pd.read_csv("test.txt", sep="\t", parse_dates=['date'])

f = lambda x: x.reindex(pd.date_range(x.index.min(), 
                                      x.index.max(), name='date'), fill_value=0)
df = (df.set_index('date')
        .groupby('product')
        .apply(f)
        .drop('product', axis=1)
        .reset_index())
print (df)
   product       date  price  amount
0        A 2019-11-17     10      20
1        A 2019-11-18      0       0
2        A 2019-11-19     15      20
3        A 2019-11-20      0       0
4        A 2019-11-21      0       0
5        A 2019-11-22      0       0
6        A 2019-11-23      0       0
7        A 2019-11-24     20      30
8        C 2019-12-01     40      50
9        C 2019-12-02      0       0
10       C 2019-12-03      0       0
11       C 2019-12-04      0       0
12       C 2019-12-05     45      35
Sign up to request clarification or add additional context in comments.

1 Comment

Make sure the dtype for 'date' is dt or you will get an error. Clearly above code deals with this by using parse_dates when reading in data. But if data is already in df use df['date'] = pd.to_datetime(df['date'])
0

one option is to use the complete function from pyjanitor to expose the missing rows per group:

#pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor

# build the dates to be applied per group
dates = dict(date = lambda df: pd.date_range(df.min(), df.max(), freq='1D'))

df.complete(dates, by='product', sort = True).fillna(0, downcast='infer')

                   date product  price  amount
0   2019-11-17 00:00:00       A     10      20
1   2019-11-18 00:00:00       A      0       0
2   2019-11-19 00:00:00       A     15      20
3   2019-11-20 00:00:00       A      0       0
4   2019-11-21 00:00:00       A      0       0
5   2019-11-22 00:00:00       A      0       0
6   2019-11-23 00:00:00       A      0       0
7   2019-11-24 00:00:00       A     20      30
8   2019-12-01 00:00:00       C     40      50
9   2019-12-02 00:00:00       C      0       0
10  2019-12-03 00:00:00       C      0       0
11  2019-12-04 00:00:00       C      0       0
12  2019-12-05 00:00:00       C     45      35

Comments

0

There's an easier method for this case:

#create the full date range, and then create a DataFrame with the range
#if needed, you can expand the range a bit using datetime.timedelta()
alldates=pd.DataFrame(pd.date_range(data.index.min()-timedelta(1),data.index.max()+timedelta(4), freq="1D",name="newdate"))

#make 'newdate' the index, and you no longer need it as a column
alldates.index=alldates.newdate
alldates.drop(columns="newdate", inplace=True)

#now, join the tables, missing dates in the original table will be filled with NaN
data=alldates.join(data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.