I have a time series dataset of product given below:
date product price amount
11/17/2019 A 10 20
11/19/2019 A 15 20
11/24/2019 A 20 30
12/01/2019 C 40 50
12/05/2019 C 45 35
This data has a missing days ("MM/dd/YYYY") between the start and end date of data for each product. I am trying to fill missing date with zero rows and convert to previous table into a table given below:
date product price amount
11/17/2019 A 10 20
11/18/2019 A 0 0
11/19/2019 A 15 20
11/20/2019 A 0 0
11/21/2019 A 0 0
11/22/2019 A 0 0
11/23/2019 A 0 0
11/24/2019 A 20 30
12/01/2019 C 40 50
12/02/2019 C 0 0
12/03/2019 C 0 0
12/04/2019 C 0 0
12/05/2019 C 45 35
To get this conversion, I used the code:
import pandas as pd
import numpy as np
data=pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
data=data.set_index(["date", "product"])
start=data.first_valid_index()[0]
end=data.last_valid_index()[0]
df=data.set_index("date").reindex(pd.date_range(start,end, freq="1D"), fill_values=0)
However the code gives an error. Is there any way to get this conversion efficiently?