df = pd.read_csv(
'https://media-doselect.s3.amazonaws.com/generic/MJjpYqLzv08xAkjqLp1ga1Aq/Historical_Data.csv')
df.head()
Date Article_ID Country_Code Sold_Units
0 20170817 1132 AT 1
1 20170818 1132 AT 1
2 20170821 1132 AT 1
3 20170822 1132 AT 1
4 20170906 1132 AT 1
I have the above-given DataFrame. Note that the Date column is of type int64 and has missing dates 19th and 20th.
I want to bring it to the format yyyy-mm-dd and impute the missing dates with values 0 in Article ID, Outlet Code and Sold Units.
So far I have tried:
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y-%m-%d')
to get the dates in the required format.
Date Article_ID Outlet_Code Sold_Units
0 2017-08-17 1132 AT 1
1 2017-08-18 1132 AT 1
2 2017-08-21 1132 AT 1
3 2017-08-22 1132 AT 1
4 2017-09-06 1132 AT 1
However, how do I impute the missing dates of 19th and 20th and impute the rows with 0s under the newly added date rows?
Here is a snippet of what I have done which is returning a value error: cannot reindex from a duplicate axis.
