5

I have a data frame (combined_ranking_df) like this in pandas python:

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
2              24259   1.0                         NaN
3              24259   6.0                         WIP
4              14251   8.0                         deployed
5              14250   1.0                         NaN
6              14250   6.0                         WIP
7              14250   5.0                         NaN
8              14250   5.0                         NaN
9              14250   1.0                         NaN

I am trying to get the max value for each id. for example, 14250 it should be 6.0. 24259 it should be 6.0.

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
3              24259   6.0                         WIP
4              14251   8.0                         deployed
6              14250   6.0                         WIP

I tried doing combined_ranking_df.groupby(['Id'], sort=False)['Rank'].max() but the result i achieved was the first dataframe (nothing changed).

What am I doing wrong?

4 Answers 4

8

Option 1
Same as @ayhan's answer here
This answers the question by sorting the dataframe that leaves the maximal value in the last position per 'Id' group. pd.DataFrame.drop_duplicates enables us to keep the first or last of each group. However, this is a handy coincidence that is very fast. It does not generalize to say the top two per 'Id'.

df.sort_values('Rank').drop_duplicates('Id', 'last')

      Id  Rank  Activity
3  24259   6.0       WIP
6  14250   6.0       WIP
0  14035   8.0  deployed
1  47728   8.0  deployed
4  14251   8.0  deployed

You can sort the index at the end

df.sort_values('Rank').drop_duplicates('Id', 'last').sort_index()

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

Option 2
groupby and idxmax
This is what I'd consider the most idiomatic way to solve this problem. @MaxU's answer is the best way that generalizes to the largest n per 'Id'.

df.loc[df.groupby('Id', sort=False).Rank.idxmax()]

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP
Sign up to request clarification or add additional context in comments.

3 Comments

Hi I tried doing this but it still the same. Is the datatype of my column wrong or something? I did this: groups = combined_ranking_df.loc[combined_ranking_df.groupby('Id', sort=False).Rank.idxmax()]
You tell me! Run combined_ranking_df.dtypes and see if 'Rank' is float
If not, run this instead combined_ranking_df.loc[combined_ranking_df.groupby('Id', sort=False).Rank.astype(float).idxmax()]
5

IIUC:

In [40]: df.groupby('Id', as_index=False, sort=False) \
           .apply(lambda x: x.nlargest(1, ['Rank'])) \
    ...:   .reset_index(level=1, drop=True)
Out[40]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
2  24259   6.0       WIP
3  14251   8.0  deployed
4  14250   6.0       WIP

or a nicer version from @piRSquared:

In [41]: df.groupby('Id', group_keys=False, sort=False) \
           .apply(pd.DataFrame.nlargest, n=1, columns='Rank')
Out[41]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

3 Comments

df.groupby('Id', group_keys=False, sort=False).apply(pd.DataFrame.nlargest, n=1, columns='Rank')
@piRSquared, i totally forgot about group_keys parameter - thanks a lot!
I've got your back!
4

Try storing it and then consult that stored groupedby

groups = combined_ranking_df.groupby(['Id'], as_index=False, sort=False).max()[['Id','Rank']].

      Id  Rank
0  14035   8.0
1  47728   8.0
2  24259   6.0
3  14251   8.0
4  14250   6.0

1 Comment

sort=False parameter to get OP's output
3

You can create a boolean index to check if the Rank for a given Id equals its max value. Then use boolean indexing to extract the max values from the dataframe.

The mask is created using a groupby on Id with the help of transform, which preserves the original dimensions of the dataframe.

>>> df[(df[['Rank']] == df[['Id', 'Rank']].groupby('Id').transform(max)).squeeze().tolist()]
      Id  Rank  Activity
0  14035     8  deployed
1  47728     8  deployed
3  24259     6       WIP
4  14251     8  deployed
6  14250     6       WIP

2 Comments

I've learned so much from your answers... I wish I saw them more often (-:
Very interesting and unusual approach - i really like it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.