1

I have a dataset where I would like to:

  1. group by location and box and take a count of the box

Data

ID  location    type    box     status          
aa  NY          no      box55   hey         
aa  NY          no      box55   hi          
aa  NY          yes     box66   hello           
aa  NY          yes     box66   goodbye         
aa  CA          no      box11   hey         
aa  CA          no      box11   hi          
aa  CA          yes     box11   hello           
aa  CA          yes     box11   goodbye         
aa  CA          no      box86   hey         
aa  CA          no      box86   hi          
aa  CA          yes     box86   hello           
aa  CA          yes     box99   goodbye         
aa  CA          no      box99   hey         
aa  CA          no      box99   hi  

    
                        
                        

Desired

location    box count   box     
NY          2           box55   
NY          2           box66   
CA          3           box11   
CA          3           box86   
CA          3           box99   

Doing

df['box count'] = df.groupby(['location','box'])['box'].size()

Any suggestion is appreciated.

1
  • and what is wrong to your solution? Commented Nov 22, 2022 at 23:41

1 Answer 1

1

Try:

df = df.groupby(["location", "box"], as_index=False).agg(
    **{"box count": ("box", "size")}
)
print(df)

Prints:

  location    box  box count
0       CA  box11          4
1       CA  box86          3
2       CA  box99          3
3       NY  box55          2
4       NY  box66          2

EDIT:

m = df.groupby(["location"])["box"].nunique()
df = df.groupby(["location", "box"], as_index=False).agg(
    **{
        "box count": (
            "location",
            lambda x: m[x.iat[0]],
        )
    }
)
print(df)

Prints:

  location    box  box count
0       CA  box11          3
1       CA  box86          3
2       CA  box99          3
3       NY  box55          2
4       NY  box66          2
Sign up to request clarification or add additional context in comments.

1 Comment

Hi @Andrej the 4 should be a 3 right? We are saying there are 3 distinct boxes in CA location (box11, box86 and box99)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.