how to utilize Pandas aggregate functions on this DataFrame?

Question

This is the table:

order_id    product_id  reordered   department_id
2           33120       1           16
2           28985       1           4
2           9327        0           13
2           45918       1           13
3           17668       1           16
3           46667       1           4
3           17461       1           12
3           32665       1           3
4           46842       0           3

I want to group by department_id, summing the number of orders that come from that department, as well as the number of orders from that department where reordered == 0. The resulting table would look like this:

department_id     number_of_orders     number_of_reordered_0
3                 2                    1
4                 2                    0
12                1                    0
13                2                    1
16                2                    0

I know this can be done in SQL (I forget what the query for that would look like as well, if anyone can refresh my memory on that, that'd be great too). But what are the Pandas functions to make that work?

I know that it starts with df.groupby('department_id').sum(). Not sure how to flesh out the rest of the line.

jezrael · Accepted Answer · 2019-05-16 07:13:38Z

Use GroupBy.agg with DataFrameGroupBy.size and lambda function for compare values by Series.eq and count by sum of True values (Trues are processes like 1):

df1 = (df.groupby('department_id')['reordered']
         .agg([('number_of_orders','size'), ('number_of_reordered_0',lambda x: x.eq(0).sum())])
         .reset_index())
print (df1)
   department_id  number_of_orders  number_of_reordered_0
0              3                 2                      1
1              4                 2                      0
2             12                 1                      0
3             13                 2                      1
4             16                 2                      0

If values are only 1 and 0 is possible use sum and last subtract:

df1 = (df.groupby('department_id')['reordered']
         .agg([('number_of_orders','size'), ('number_of_reordered_0','sum')])
         .reset_index())

df1['number_of_reordered_0'] = df1['number_of_orders'] - df1['number_of_reordered_0']
print (df1)
   department_id  number_of_orders  number_of_reordered_0
0              3                 2                      1
1              4                 2                      0
2             12                 1                      0
3             13                 2                      1
4             16                 2                      0

Zaynul Abadin Tuhin · Accepted Answer · 2019-05-16 07:10:33Z

1

in sql it would be simple aggregation

select department_id,count(*) as number_of_orders,
sum(case when reordered=0 then 1 else 0 end) as number_of_reordered_0
from tabl_name
group by department_id

answered May 16, 2019 at 7:10

Zaynul Abadin Tuhin

32.1k6 gold badges37 silver badges66 bronze badges

Collectives™ on Stack Overflow

how to utilize Pandas aggregate functions on this DataFrame?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related