0

I have a Pandas dataframe where for certain dates certain products are missing. I want to add those rows to the dataframe and assign them a sales value of 0. How can I do that?

# Sample dataframe
import pandas as pd
df = pd.DataFrame({
    'date': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03'],
    'product': ['glass', 'clothes', 'food', 'glass', 'food', 'glass', 'clothes'],
    'sales': [100, 120, 50, 90, 60, 110, 130]
})

        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  food    60
5   2020-01-03  glass   110
6   2020-01-03  clothes 130

## 'clothes' is missing for 2020-01-02 and 'food' is missing for 2020-01-03
## What I want to get: 
        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  clothes 0
5   2020-01-02  food    60
6   2020-01-03  glass   110
7   2020-01-03  clothes 130
8   2020-01-03  food    0

3 Answers 3

2

You can do with unstack()/stack():

(df.set_index(['date','product'])
   .unstack(fill_value=0)
   .stack()
   .reset_index()
)

Output:

         date  product  sales
0  2020-01-01  clothes    120
1  2020-01-01     food     50
2  2020-01-01    glass    100
3  2020-01-02  clothes      0
4  2020-01-02     food     60
5  2020-01-02    glass     90
6  2020-01-03  clothes    130
7  2020-01-03     food      0
8  2020-01-03    glass    110
Sign up to request clarification or add additional context in comments.

Comments

2

Try with pivot

df=df.pivot(*df.columns).fillna(0).stack().to_frame('sales').reset_index()
df
Out[120]: 
         date  product  sales
0  2020-01-01  clothes  120.0
1  2020-01-01     food   50.0
2  2020-01-01    glass  100.0
3  2020-01-02  clothes    0.0
4  2020-01-02     food   60.0
5  2020-01-02    glass   90.0
6  2020-01-03  clothes  130.0
7  2020-01-03     food    0.0
8  2020-01-03    glass  110.0

3 Comments

What does *df.columns stand for?
@Guru equal to df.pivot(df.columns.tolist()) :-)
Sorry, i am still learning. When i study syntax of pivot, i see it needs index, columns and values. When we specify *df.columns, what goes to index, what goes to columns?
1

Use set_index with reindex:

(df.set_index(['date', 'product'])
   .reindex(pd.MultiIndex.from_product([df['date'].unique(), 
                                        df['product'].unique()], 
                                       names=['date', 'product']), 
            fill_value=0)
   .reset_index())

Output:

         date  product  sales
0  2020-01-01    glass    100
1  2020-01-01  clothes    120
2  2020-01-01     food     50
3  2020-01-02    glass     90
4  2020-01-02  clothes      0
5  2020-01-02     food     60
6  2020-01-03    glass    110
7  2020-01-03  clothes    130
8  2020-01-03     food      0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.