0

I have a dataset that looks similar to this:

Name Status Activity
Jane student yes
John businessman yes
Elle student no
Chris policeman yes
John businessman no
Clay businessman yes

I want to group the dataset by Status and Name which have Activity as a 'yes' and count the Name. If it at least has one 'yes', it will be counted.

Basically, this is the output that I want:

student 1 Jane

businessman 2 John, Clay

policeman 1 Chris

I've tried these codes:

cb = (DataFrame.groupby(['Name', 'Status']).sum(DataFrame['Activity'].eq('yes')))

cb = (DataFrame.groupby(['Name', 'Status']).any(DataFrame['Activity'].eq('yes')))

cb = (DataFrame.groupby(['Name', 'Status']).nunique(DataFrame['Activity'].eq('yes')))

but, all of them give this error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Please help me to fix this code. Thank you in advance!

2 Answers 2

2

Example

data = {'Name': {0: 'Jane', 1: 'John', 2: 'Elle', 3: 'Chris', 4: 'John', 5: 'Clay'},
        'Status': {0: 'student', 1: 'businessman', 2: 'student', 3: 'policeman', 4: 'businessman', 5: 'businessman'},
        'Activity': {0: 'yes', 1: 'yes', 2: 'no', 3: 'yes', 4: 'no', 5: 'yes'}}
df = pd.DataFrame(data)

Code

out = (df[df['Activity'].eq('yes')]
       .groupby('Status', sort=False)['Name'].agg(['count', ', '.join]))

out

            count   join
Status      
student     1       Jane
businessman 2       John, Clay
policeman   1       Chris
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the answer, it works well, but why does when I deploy this code to the real data, it counts the activity number, while I want the number of distinct names that have activity as a yes?
It's my job to solve the examples and yours to put them into your dataset. It is not difficult to get your output using solution. For various reasons, I only solve examples and do not take additional questions. The following function will help you pandas.pydata.org/docs/reference/api/pandas.unique.html
1

Check below:

dd = df.query("Activity != 'no'").\
groupby('Status').\
agg({'Name':[','.join,'count']}).reset_index()

dd.columns = ['Status','Names','count']

dd.head()

Output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.