0

I have a sample dataset -

Id Category 

1  Active  
1  Active   
1  Active   
1  End      
2  Paused  
2  Active   
2  Active  

Expected output is a new column based on the counter which uses group by id, and resets the counter when category changes.

Expected output :-

Id Category Count

1  Active   0
1  Active   1
1  Active   2
1  End      0
2  Omitted  0
2  Active   0
2  Active   1

I have already used the following -

m = df['Category'] != df['Category'].shift(-1)
df['count'] = np.where(m, df.groupby(m.ne(m.shift(),'Id').cumsum()).cumcount()+1, 0)

but it fills with only 0

Also I have tried this -

mask = df['Id'] == df['Id'].shift(-1)
df['CatChange'] = df['Category'] != df['Category'].shift(-1)
count = df[mask].groupby('Id').cumcount()
df['CatChange_num'] = count

This just increments value without considering Category change.

Any pointers will be helpful.

2 Answers 2

1

You can try:

df['count'] = df.groupby(['Id','Category']).cumcount()

And if you want your count to start from 1, you can do:

df['count2'] = df.groupby(['Id','Category']).cumcount() + 1

Which print:

   Id Category  count  count2
0   1   Active      0       1
1   1   Active      1       2
2   1   Active      2       3
3   1      End      0       1
4   2   Paused      0       1
5   2   Active      0       1
6   2   Active      1       2
Sign up to request clarification or add additional context in comments.

Comments

0

We can groupby two column and cumcount

df.groupby(['d','Category']).cumcount()
0    0
1    1
2    2
3    0
4    0
5    0
6    1
dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.