2

I am working on a dataframe pandas that look like that :

Product_id   month    sales
1            01-2018  25
1            02-2018  34
1            03-2018  29
1            04-2018  45
2            02-2018  3
2            04-2018  2

The sales are between 01-2018 and 09-2020. But my product 2 hasn't be sold in 03-2018. So I am trying to add a row with Product_id 2, month = 03-2018 and sales =0. I don't want to add a row for 01-2018 because if the first sale are in 2018-02, it means that the product wasn't available in 01-2018.

I've got the month of the first sale by products with this code :

df.groupby('Product_id').month.min().reset_index()

Now I'm trying to add rows for each product for each month if the data doesn't exist. I've got nothing that work well yet. Any idea will be welcomed.

Thanks in advance (and sorry for my approximate english!!)

1
  • Do you need same ranges for all Product_id like in my answer? Commented Oct 22, 2020 at 10:54

2 Answers 2

2

Use:

print (df)
   Product_id    month  sales
0           1  01-2018     25
1           1  02-2018     34
2           1  06-2018     29 <- changed dates
3           1  04-2018     45
4           2  02-2018      3
5           2  04-2018      2

df['month'] = pd.to_datetime(df['month'])

df = (df.set_index(['month','Product_id'])['sales']
        .unstack(fill_value=0)
        .asfreq('MS', fill_value=0)
        .unstack()
        .reset_index(name='value'))
print (df)
    Product_id      month  value
0            1 2018-01-01     25
1            1 2018-02-01     34
2            1 2018-03-01      0
3            1 2018-04-01     45
4            1 2018-05-01      0
5            1 2018-06-01     29
6            2 2018-01-01      0
7            2 2018-02-01      3
8            2 2018-03-01      0
9            2 2018-04-01      2
10           2 2018-05-01      0
11           2 2018-06-01      0
Sign up to request clarification or add additional context in comments.

Comments

1

I can't think of any direct solution. You can use the following code snippet,

import pandas as pd

df = pd.DataFrame([{'Product_id': 1, 'month': '01-2018', 'sales': 25},
                   {'Product_id': 1, 'month': '02-2018', 'sales': 34},
                   {'Product_id': 1, 'month': '03-2018', 'sales': 29},
                   {'Product_id': 1, 'month': '04-2018', 'sales': 45},
                   {'Product_id': 2, 'month': '02-2018', 'sales': 3},
                   {'Product_id': 2, 'month': '04-2018', 'sales': 2}])


# Maintaining separate columns for month and year. Just easy to groupby.
# You can also convert 'month' column to date object
df[['month_no','year']] = df.month.str.split('-', expand=True)
df['month_no'] = df['month_no'].astype(int) 
df['year'] = df['year'].astype(int) 

unique_product_ids = df['Product_id'].unique()
unique_years = df['year'].unique()
grpby_df = df.groupby(by=['Product_id','year'])

for unique_product_id in unique_product_ids:
    for unique_year in unique_years:
        try:
            subset_df = grpby_df.get_group((unique_product_id, unique_year))
        except KeyError:
            continue
        start_month = min(subset_df['month_no'])
        end_month = 12 # Assuming sales=0 for all subsequent months
    months_list = list(subset_df['month_no'])
    for i in range(start_month, end_month +1):
        if i not in months_list:
            df = df.append(
                        {
                        'Product_id': unique_product_id, 
                        'month_no': i, 
                        'year': unique_year, 
                        'sales': 0
                        },
                        ignore_index = True)

You'll get total of 23 rows as result. 12 for product 1 and 11 for product 2(since we are ignoring the 1st month)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.