0

Need help in filling gaps for missing beginning date of a month in df_1 (for example: 01, 02, 05, and 07 to 11), I need to have a continuous months (i.e. 12) .

In: df_1 = pd.DataFrame([['2021-03-01', 'Supp_1', 'Product_1', '1'],
                 ['2021-04-01', 'Supp_1', 'Product_1', 1],
                 ['2021-06-01','Supp_1', 'Product_1', 1],
                 ['2021-12-01', 'Supp_1', 'Product_1', 1.25]],
                columns=['Date','Supplier','Product','Cost'])
Out: 

       Date     Supplier    Product     Cost
0   2021-03-01  Supp_1  Product_1   1
1   2021-04-01  Supp_1  Product_1   1
2   2021-06-01  Supp_1  Product_1   1
3   2021-12-01  Supp_1  Product_1   1.25

Expected result is,

      Date     Supplier Product     Cost
0   2021-01-01  Supp_1  Product_1   1
1   2021-02-01  Supp_1  Product_1   1
2   2021-03-01  Supp_1  Product_1   
3   2021-04-01  Supp_1  Product_1   
4   2021-05-01  Supp_1  Product_1   
5   2021-06-01  Supp_1  Product_1   1
6   2021-07-01  Supp_1  Product_1   
7   2021-08-01  Supp_1  Product_1   
8   2021-09-01  Supp_1  Product_1   
9   2021-10-01  Supp_1  Product_1   
10  2021-11-01  Supp_1  Product_1   
11  2021-12-01  Supp_1  Product_1   1.25

Once we have the df_2, then I can use ffill() and bfill() to fill the gaps for 'Cost'

2
  • You want to fill in the missing dates. How would you fill the Supplier and Product columns? Commented Oct 14, 2021 at 15:23
  • I guess I closed maybe to fast, this requires quite a few steps, sorry about that Commented Oct 14, 2021 at 15:27

4 Answers 4

1

If I get you correctly, you want to expose the missing rows, based on a combination of all the months in the year, along with Supplier and Product, and then forward/backward fill on the Cost column.

Maybe the complete function from pyjanitor could help:

# pip git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn

year = df.Date.dt.year.at[0]
months = pd.date_range(f"{year}-01-01", f"{year}-12-01", freq="MS")
months = dict(Date = months)

 df.complete(months, 'Supplier', 'Product', sort = True)
 
         Date Supplier    Product  Cost
0  2021-01-01   Supp_1  Product_1   NaN
1  2021-02-01   Supp_1  Product_1   NaN
2  2021-03-01   Supp_1  Product_1     1
3  2021-04-01   Supp_1  Product_1     1
4  2021-05-01   Supp_1  Product_1   NaN
5  2021-06-01   Supp_1  Product_1     1
6  2021-07-01   Supp_1  Product_1   NaN
7  2021-08-01   Supp_1  Product_1   NaN
8  2021-09-01   Supp_1  Product_1   NaN
9  2021-10-01   Supp_1  Product_1   NaN
10 2021-11-01   Supp_1  Product_1   NaN
11 2021-12-01   Supp_1  Product_1  1.25

You can then fill up or down on the Cost column.

Sign up to request clarification or add additional context in comments.

Comments

0

You can use resample:

print (df_1.assign(Date=pd.to_datetime(df_1["Date"]))
           .set_index("Date")
           .resample("MS").asfreq()
           .reset_index())

         Date Supplier    Product  Cost
0  2021-01-01   Supp_1  Product_1     1
1  2021-02-01   Supp_1  Product_1     1
2  2021-03-01      NaN        NaN   NaN
3  2021-04-01      NaN        NaN   NaN
4  2021-05-01      NaN        NaN   NaN
5  2021-06-01   Supp_1  Product_1     1
6  2021-07-01      NaN        NaN   NaN
7  2021-08-01      NaN        NaN   NaN
8  2021-09-01      NaN        NaN   NaN
9  2021-10-01      NaN        NaN   NaN
10 2021-11-01      NaN        NaN   NaN
11 2021-12-01   Supp_1  Product_1  1.25

1 Comment

Could add ffill() for 'Supplier' and 'Product'. This works for the mentioned example. lets say there is no data for 2021-01-01 or 2021-02-01, the code you provided wont generate these 2 missing months.
0

You can use this pipeline. The key step is to set_index Date and use asfreq:

(df1.assign(Date=pd.to_datetime(df1['Date']))
    .set_index('Date')
    .asfreq('MS')
    .assign(Supplier=lambda d: d['Supplier'].ffill(),
            Product=lambda d: d['Product'].ffill()
           )
    .reset_index()
)

output:

         Date Supplier    Product  Cost
0  2021-01-01   Supp_1  Product_1     1
1  2021-02-01   Supp_1  Product_1     1
2  2021-03-01   Supp_1  Product_1   NaN
3  2021-04-01   Supp_1  Product_1   NaN
4  2021-05-01   Supp_1  Product_1   NaN
5  2021-06-01   Supp_1  Product_1     1
6  2021-07-01   Supp_1  Product_1   NaN
7  2021-08-01   Supp_1  Product_1   NaN
8  2021-09-01   Supp_1  Product_1   NaN
9  2021-10-01   Supp_1  Product_1   NaN
10 2021-11-01   Supp_1  Product_1   NaN
11 2021-12-01   Supp_1  Product_1  1.25

2 Comments

Oh yes, got it..
This works for the mentioned example. lets say there is no data for 2021-01-01 or 2021-02-01, the code you provided wont generate these 2 missing months.
0

Another option:

df_1.Date = pd.to_datetime(df_1.Date)
df_1 = df_1.set_index('Date').asfreq('MS').reset_index()
df_1

    Date    Supplier    Product Cost
0   2021-01-01  Supp_1  Product_1   1
1   2021-01-02  NaN NaN NaN
2   2021-01-03  NaN NaN NaN
3   2021-01-04  NaN NaN NaN
4   2021-01-05  NaN NaN NaN
... ... ... ... ...
330 2021-11-27  NaN NaN NaN
331 2021-11-28  NaN NaN NaN
332 2021-11-29  NaN NaN NaN
333 2021-11-30  NaN NaN NaN
334 2021-12-01  Supp_1  Product_1   1.25
335 rows × 4 columns

3 Comments

I think the wanted freq is 'MS' not 'D' ;)
You are right. Correcting. Thanks
Could add ffill() for 'Supplier' and 'Product'. This works for the mentioned example. lets say there is no data for 2021-01-01 or 2021-02-01, the code you provided wont generate these 2 missing months.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.