Add rows for missing data grouped by another column in Pandas DataFrame

Question

I have a Pandas dataframe where for certain dates certain products are missing. I want to add those rows to the dataframe and assign them a sales value of 0. How can I do that?

# Sample dataframe
import pandas as pd
df = pd.DataFrame({
    'date': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03'],
    'product': ['glass', 'clothes', 'food', 'glass', 'food', 'glass', 'clothes'],
    'sales': [100, 120, 50, 90, 60, 110, 130]
})

        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  food    60
5   2020-01-03  glass   110
6   2020-01-03  clothes 130

## 'clothes' is missing for 2020-01-02 and 'food' is missing for 2020-01-03
## What I want to get: 
        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  clothes 0
5   2020-01-02  food    60
6   2020-01-03  glass   110
7   2020-01-03  clothes 130
8   2020-01-03  food    0

Quang Hoang · Accepted Answer · 2020-06-25 20:28:07Z

2

You can do with unstack()/stack():

(df.set_index(['date','product'])
   .unstack(fill_value=0)
   .stack()
   .reset_index()
)

Output:

         date  product  sales
0  2020-01-01  clothes    120
1  2020-01-01     food     50
2  2020-01-01    glass    100
3  2020-01-02  clothes      0
4  2020-01-02     food     60
5  2020-01-02    glass     90
6  2020-01-03  clothes    130
7  2020-01-03     food      0
8  2020-01-03    glass    110

answered Jun 25, 2020 at 20:28

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2020-06-25 20:35:56Z

2

Try with pivot

df=df.pivot(*df.columns).fillna(0).stack().to_frame('sales').reset_index()
df
Out[120]: 
         date  product  sales
0  2020-01-01  clothes  120.0
1  2020-01-01     food   50.0
2  2020-01-01    glass  100.0
3  2020-01-02  clothes    0.0
4  2020-01-02     food   60.0
5  2020-01-02    glass   90.0
6  2020-01-03  clothes  130.0
7  2020-01-03     food    0.0
8  2020-01-03    glass  110.0

answered Jun 25, 2020 at 20:35

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

Guru Over a year ago

What does *df.columns stand for?

BENY Over a year ago

@Guru equal to df.pivot(df.columns.tolist()) :-)

Guru Over a year ago

Sorry, i am still learning. When i study syntax of pivot, i see it needs index, columns and values. When we specify *df.columns, what goes to index, what goes to columns?

Scott Boston · Accepted Answer · 2020-06-25 20:44:55Z

1

Use set_index with reindex:

(df.set_index(['date', 'product'])
   .reindex(pd.MultiIndex.from_product([df['date'].unique(), 
                                        df['product'].unique()], 
                                       names=['date', 'product']), 
            fill_value=0)
   .reset_index())

Output:

         date  product  sales
0  2020-01-01    glass    100
1  2020-01-01  clothes    120
2  2020-01-01     food     50
3  2020-01-02    glass     90
4  2020-01-02  clothes      0
5  2020-01-02     food     60
6  2020-01-03    glass    110
7  2020-01-03  clothes    130
8  2020-01-03     food      0

answered Jun 25, 2020 at 20:44

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

Add rows for missing data grouped by another column in Pandas DataFrame

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related