Adding rows in df pandas if doesn't exists and based on 2 columns conditions

Question

I am working on a dataframe pandas that look like that :

Product_id   month    sales
1            01-2018  25
1            02-2018  34
1            03-2018  29
1            04-2018  45
2            02-2018  3
2            04-2018  2

The sales are between 01-2018 and 09-2020. But my product 2 hasn't be sold in 03-2018. So I am trying to add a row with Product_id 2, month = 03-2018 and sales =0. I don't want to add a row for 01-2018 because if the first sale are in 2018-02, it means that the product wasn't available in 01-2018.

I've got the month of the first sale by products with this code :

df.groupby('Product_id').month.min().reset_index()

Now I'm trying to add rows for each product for each month if the data doesn't exist. I've got nothing that work well yet. Any idea will be welcomed.

Thanks in advance (and sorry for my approximate english!!)

Do you need same ranges for all Product_id like in my answer? — jezrael
– jezrael, Commented Oct 22, 2020 at 10:54

jezrael · Accepted Answer · 2020-10-22 10:52:29Z

Use:

print (df)
   Product_id    month  sales
0           1  01-2018     25
1           1  02-2018     34
2           1  06-2018     29 <- changed dates
3           1  04-2018     45
4           2  02-2018      3
5           2  04-2018      2

df['month'] = pd.to_datetime(df['month'])

df = (df.set_index(['month','Product_id'])['sales']
        .unstack(fill_value=0)
        .asfreq('MS', fill_value=0)
        .unstack()
        .reset_index(name='value'))
print (df)
    Product_id      month  value
0            1 2018-01-01     25
1            1 2018-02-01     34
2            1 2018-03-01      0
3            1 2018-04-01     45
4            1 2018-05-01      0
5            1 2018-06-01     29
6            2 2018-01-01      0
7            2 2018-02-01      3
8            2 2018-03-01      0
9            2 2018-04-01      2
10           2 2018-05-01      0
11           2 2018-06-01      0

Amith Lakkakula · Accepted Answer · 2020-10-22 11:25:40Z

I can't think of any direct solution. You can use the following code snippet,

import pandas as pd

df = pd.DataFrame([{'Product_id': 1, 'month': '01-2018', 'sales': 25},
                   {'Product_id': 1, 'month': '02-2018', 'sales': 34},
                   {'Product_id': 1, 'month': '03-2018', 'sales': 29},
                   {'Product_id': 1, 'month': '04-2018', 'sales': 45},
                   {'Product_id': 2, 'month': '02-2018', 'sales': 3},
                   {'Product_id': 2, 'month': '04-2018', 'sales': 2}])


# Maintaining separate columns for month and year. Just easy to groupby.
# You can also convert 'month' column to date object
df[['month_no','year']] = df.month.str.split('-', expand=True)
df['month_no'] = df['month_no'].astype(int) 
df['year'] = df['year'].astype(int) 

unique_product_ids = df['Product_id'].unique()
unique_years = df['year'].unique()
grpby_df = df.groupby(by=['Product_id','year'])

for unique_product_id in unique_product_ids:
    for unique_year in unique_years:
        try:
            subset_df = grpby_df.get_group((unique_product_id, unique_year))
        except KeyError:
            continue
        start_month = min(subset_df['month_no'])
        end_month = 12 # Assuming sales=0 for all subsequent months
    months_list = list(subset_df['month_no'])
    for i in range(start_month, end_month +1):
        if i not in months_list:
            df = df.append(
                        {
                        'Product_id': unique_product_id, 
                        'month_no': i, 
                        'year': unique_year, 
                        'sales': 0
                        },
                        ignore_index = True)

You'll get total of 23 rows as result. 12 for product 1 and 11 for product 2(since we are ignoring the 1st month)

Collectives™ on Stack Overflow

Adding rows in df pandas if doesn't exists and based on 2 columns conditions

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related