0

This is the table:

order_id    product_id  reordered   department_id
2           33120       1           16
2           28985       1           4
2           9327        0           13
2           45918       1           13
3           17668       1           16
3           46667       1           4
3           17461       1           12
3           32665       1           3
4           46842       0           3

I want to group by department_id, summing the number of orders that come from that department, as well as the number of orders from that department where reordered == 0. The resulting table would look like this:

department_id     number_of_orders     number_of_reordered_0
3                 2                    1
4                 2                    0
12                1                    0
13                2                    1
16                2                    0

I know this can be done in SQL (I forget what the query for that would look like as well, if anyone can refresh my memory on that, that'd be great too). But what are the Pandas functions to make that work?

I know that it starts with df.groupby('department_id').sum(). Not sure how to flesh out the rest of the line.

2 Answers 2

1

Use GroupBy.agg with DataFrameGroupBy.size and lambda function for compare values by Series.eq and count by sum of True values (Trues are processes like 1):

df1 = (df.groupby('department_id')['reordered']
         .agg([('number_of_orders','size'), ('number_of_reordered_0',lambda x: x.eq(0).sum())])
         .reset_index())
print (df1)
   department_id  number_of_orders  number_of_reordered_0
0              3                 2                      1
1              4                 2                      0
2             12                 1                      0
3             13                 2                      1
4             16                 2                      0

If values are only 1 and 0 is possible use sum and last subtract:

df1 = (df.groupby('department_id')['reordered']
         .agg([('number_of_orders','size'), ('number_of_reordered_0','sum')])
         .reset_index())

df1['number_of_reordered_0'] = df1['number_of_orders'] - df1['number_of_reordered_0']
print (df1)
   department_id  number_of_orders  number_of_reordered_0
0              3                 2                      1
1              4                 2                      0
2             12                 1                      0
3             13                 2                      1
4             16                 2                      0
Sign up to request clarification or add additional context in comments.

Comments

1

in sql it would be simple aggregation

select department_id,count(*) as number_of_orders,
sum(case when reordered=0 then 1 else 0 end) as number_of_reordered_0
from tabl_name
group by department_id

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.