Python: Pandas Dataframe, groupby but keeping otherwise missing values

Question

My dataframe is looking something like this:

       date    |  status  |  value
0      2020-01 |  true    |  3
1      2020-01 |  true    |  8
2      2020-02 |  true    |  5
3      2020-01 |  false   |  1
4      2020-01 |  false   |  11

For most functions I would call groupby...

df_group = df.groupby(['date', 'status'])[['value']].agg('sum').reset_index()

...and obtain this:

       date    |  status  |  value
0      2020-01 |  true    |  11
1      2020-02 |  true    |  5
2      2020-01 |  false   |  12

But what I'm trying to reach is a dataframe, where also the 'missing' values are listed, like this:

       date    |  status  |  value
0      2020-01 |  true    |  11
1      2020-02 |  true    |  5
2      2020-01 |  false   |  12
3      2020-02 |  false   |  0

Is groupby even the correct funtion?

jezrael · Accepted Answer · 2020-09-10 13:42:32Z

2

Use Series.unstack with DataFrame.stack, also remove [[]] and instead agg is possible use only sum:

df_group = (df.groupby(['date', 'status'])['value']
              .sum()
              .unstack(fill_value=0)
              .stack()
              .reset_index(name='value'))
print (df_group)
      date  status  value
0  2020-01   False     12
1  2020-01    True     11
2  2020-02   False      0
3  2020-02    True      5

answered Sep 10, 2020 at 13:42

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python: Pandas Dataframe, groupby but keeping otherwise missing values

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related