fill missing date column in python dataframe

Question

Need help in filling gaps for missing beginning date of a month in df_1 (for example: 01, 02, 05, and 07 to 11), I need to have a continuous months (i.e. 12) .

In: df_1 = pd.DataFrame([['2021-03-01', 'Supp_1', 'Product_1', '1'],
                 ['2021-04-01', 'Supp_1', 'Product_1', 1],
                 ['2021-06-01','Supp_1', 'Product_1', 1],
                 ['2021-12-01', 'Supp_1', 'Product_1', 1.25]],
                columns=['Date','Supplier','Product','Cost'])
Out: 

       Date     Supplier    Product     Cost
0   2021-03-01  Supp_1  Product_1   1
1   2021-04-01  Supp_1  Product_1   1
2   2021-06-01  Supp_1  Product_1   1
3   2021-12-01  Supp_1  Product_1   1.25

Expected result is,

      Date     Supplier Product     Cost
0   2021-01-01  Supp_1  Product_1   1
1   2021-02-01  Supp_1  Product_1   1
2   2021-03-01  Supp_1  Product_1   
3   2021-04-01  Supp_1  Product_1   
4   2021-05-01  Supp_1  Product_1   
5   2021-06-01  Supp_1  Product_1   1
6   2021-07-01  Supp_1  Product_1   
7   2021-08-01  Supp_1  Product_1   
8   2021-09-01  Supp_1  Product_1   
9   2021-10-01  Supp_1  Product_1   
10  2021-11-01  Supp_1  Product_1   
11  2021-12-01  Supp_1  Product_1   1.25

Once we have the df_2, then I can use ffill() and bfill() to fill the gaps for 'Cost'

You want to fill in the missing dates. How would you fill the Supplier and Product columns? — gtomer
– gtomer, Commented Oct 14, 2021 at 15:23
I guess I closed maybe to fast, this requires quite a few steps, sorry about that — mozway
– mozway, Commented Oct 14, 2021 at 15:27

sammywemmy · Accepted Answer · 2021-10-14 21:18:09Z

If I get you correctly, you want to expose the missing rows, based on a combination of all the months in the year, along with Supplier and Product, and then forward/backward fill on the Cost column.

Maybe the complete function from pyjanitor could help:

# pip git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn

year = df.Date.dt.year.at[0]
months = pd.date_range(f"{year}-01-01", f"{year}-12-01", freq="MS")
months = dict(Date = months)

 df.complete(months, 'Supplier', 'Product', sort = True)
 
         Date Supplier    Product  Cost
0  2021-01-01   Supp_1  Product_1   NaN
1  2021-02-01   Supp_1  Product_1   NaN
2  2021-03-01   Supp_1  Product_1     1
3  2021-04-01   Supp_1  Product_1     1
4  2021-05-01   Supp_1  Product_1   NaN
5  2021-06-01   Supp_1  Product_1     1
6  2021-07-01   Supp_1  Product_1   NaN
7  2021-08-01   Supp_1  Product_1   NaN
8  2021-09-01   Supp_1  Product_1   NaN
9  2021-10-01   Supp_1  Product_1   NaN
10 2021-11-01   Supp_1  Product_1   NaN
11 2021-12-01   Supp_1  Product_1  1.25

You can then fill up or down on the Cost column.

Henry Yik · Accepted Answer · 2021-10-14 15:25:57Z

0

You can use resample:

print (df_1.assign(Date=pd.to_datetime(df_1["Date"]))
           .set_index("Date")
           .resample("MS").asfreq()
           .reset_index())

         Date Supplier    Product  Cost
0  2021-01-01   Supp_1  Product_1     1
1  2021-02-01   Supp_1  Product_1     1
2  2021-03-01      NaN        NaN   NaN
3  2021-04-01      NaN        NaN   NaN
4  2021-05-01      NaN        NaN   NaN
5  2021-06-01   Supp_1  Product_1     1
6  2021-07-01      NaN        NaN   NaN
7  2021-08-01      NaN        NaN   NaN
8  2021-09-01      NaN        NaN   NaN
9  2021-10-01      NaN        NaN   NaN
10 2021-11-01      NaN        NaN   NaN
11 2021-12-01   Supp_1  Product_1  1.25

answered Oct 14, 2021 at 15:25

Henry Yik

22.6k5 gold badges21 silver badges44 bronze badges

1 Comment

spartanboy Over a year ago

Could add ffill() for 'Supplier' and 'Product'. This works for the mentioned example. lets say there is no data for 2021-01-01 or 2021-02-01, the code you provided wont generate these 2 missing months.

mozway · Accepted Answer · 2021-10-14 15:30:14Z

0

You can use this pipeline. The key step is to set_index Date and use asfreq:

(df1.assign(Date=pd.to_datetime(df1['Date']))
    .set_index('Date')
    .asfreq('MS')
    .assign(Supplier=lambda d: d['Supplier'].ffill(),
            Product=lambda d: d['Product'].ffill()
           )
    .reset_index()
)

output:

         Date Supplier    Product  Cost
0  2021-01-01   Supp_1  Product_1     1
1  2021-02-01   Supp_1  Product_1     1
2  2021-03-01   Supp_1  Product_1   NaN
3  2021-04-01   Supp_1  Product_1   NaN
4  2021-05-01   Supp_1  Product_1   NaN
5  2021-06-01   Supp_1  Product_1     1
6  2021-07-01   Supp_1  Product_1   NaN
7  2021-08-01   Supp_1  Product_1   NaN
8  2021-09-01   Supp_1  Product_1   NaN
9  2021-10-01   Supp_1  Product_1   NaN
10 2021-11-01   Supp_1  Product_1   NaN
11 2021-12-01   Supp_1  Product_1  1.25

answered Oct 14, 2021 at 15:30

mozway

267k13 gold badges56 silver badges106 bronze badges

2 Comments

Muhammad Hassan Over a year ago

Oh yes, got it..

spartanboy Over a year ago

This works for the mentioned example. lets say there is no data for 2021-01-01 or 2021-02-01, the code you provided wont generate these 2 missing months.

gtomer · Accepted Answer · 2021-10-14 15:39:48Z

0

Another option:

df_1.Date = pd.to_datetime(df_1.Date)
df_1 = df_1.set_index('Date').asfreq('MS').reset_index()
df_1

    Date    Supplier    Product Cost
0   2021-01-01  Supp_1  Product_1   1
1   2021-01-02  NaN NaN NaN
2   2021-01-03  NaN NaN NaN
3   2021-01-04  NaN NaN NaN
4   2021-01-05  NaN NaN NaN
... ... ... ... ...
330 2021-11-27  NaN NaN NaN
331 2021-11-28  NaN NaN NaN
332 2021-11-29  NaN NaN NaN
333 2021-11-30  NaN NaN NaN
334 2021-12-01  Supp_1  Product_1   1.25
335 rows × 4 columns

edited Oct 14, 2021 at 15:39

answered Oct 14, 2021 at 15:32

gtomer

6,6041 gold badge14 silver badges29 bronze badges

3 Comments

mozway Over a year ago

I think the wanted freq is 'MS' not 'D' ;)

gtomer Over a year ago

You are right. Correcting. Thanks

spartanboy Over a year ago

Could add ffill() for 'Supplier' and 'Product'. This works for the mentioned example. lets say there is no data for 2021-01-01 or 2021-02-01, the code you provided wont generate these 2 missing months.

Collectives™ on Stack Overflow

fill missing date column in python dataframe

4 Answers 4

Comments

1 Comment

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related