2

Here is the data frame

big = pd.DataFrame({'group': ['A', 'A', 'A','A', 'B','B','C','D','D', 'D'], 'animal': ['ALL other', 'cat','rabbit', 'dog', 'rabbit','ALL other', 'ALL', 'ALL other', 'dog','cat']})
big
        group   animal
0   A   ALL other
1   A   cat
2   A   rabbit
3   A   dog
4   B   rabbit
5   B   ALL other
6   C   ALL
7   D   ALL other
8   D   dog
9   D   cat

the rule is that if rabbit in the group then pick out the group, if the animal is 'ALL' then pick all and regard it as rabbit, if there is no rabbit in the group then pick 'ALL other' and regard it as rabbit.

The small data frame is below

    group   animal
0   A   rabbit
1   B   rabbit
2   C   ALL
3   D   ALL other
2
  • is this a homework? if yes , show us what have you tried so far Commented Jan 17, 2023 at 3:37
  • Try using .loc method to filter the rows in the dataframe. Commented Jan 17, 2023 at 3:42

1 Answer 1

2

First filter the DataFrame to only keep rabbit/ALL/ALL other, then take advantage of the fact that rabbit is sorted after ALL in lexicographic order to get a groupby.max:

m = big['animal'].isin(['rabbit', 'ALL', 'ALL other'])

big[m].groupby('group', as_index=False).max()

For a generic approach, make "animal" an ordered Categorical and you will be able to choose any custom order.

Output:


  group     animal
0     A     rabbit
1     B     rabbit
2     C        ALL
3     D  ALL other
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for it, can I ask how to keep the sequence of the rows? Found that it will be out of order after using groupby.max()
Add sort=False as parameter to groupby

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.