Finding the max value in Python Column

Question

I have a data frame (combined_ranking_df) like this in pandas python:

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
2              24259   1.0                         NaN
3              24259   6.0                         WIP
4              14251   8.0                         deployed
5              14250   1.0                         NaN
6              14250   6.0                         WIP
7              14250   5.0                         NaN
8              14250   5.0                         NaN
9              14250   1.0                         NaN

I am trying to get the max value for each id. for example, 14250 it should be 6.0. 24259 it should be 6.0.

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
3              24259   6.0                         WIP
4              14251   8.0                         deployed
6              14250   6.0                         WIP

I tried doing combined_ranking_df.groupby(['Id'], sort=False)['Rank'].max() but the result i achieved was the first dataframe (nothing changed).

What am I doing wrong?

piRSquared · Accepted Answer · 2017-07-12 18:01:11Z

8

Option 1
Same as @ayhan's answer here
This answers the question by sorting the dataframe that leaves the maximal value in the last position per 'Id' group. pd.DataFrame.drop_duplicates enables us to keep the first or last of each group. However, this is a handy coincidence that is very fast. It does not generalize to say the top two per 'Id'.

df.sort_values('Rank').drop_duplicates('Id', 'last')

      Id  Rank  Activity
3  24259   6.0       WIP
6  14250   6.0       WIP
0  14035   8.0  deployed
1  47728   8.0  deployed
4  14251   8.0  deployed

You can sort the index at the end

df.sort_values('Rank').drop_duplicates('Id', 'last').sort_index()

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

Option 2
groupby and idxmax
This is what I'd consider the most idiomatic way to solve this problem. @MaxU's answer is the best way that generalizes to the largest n per 'Id'.

df.loc[df.groupby('Id', sort=False).Rank.idxmax()]

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

edited Jul 12, 2017 at 18:01

answered Jul 12, 2017 at 17:32

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Adam Over a year ago

Hi I tried doing this but it still the same. Is the datatype of my column wrong or something? I did this: groups = combined_ranking_df.loc[combined_ranking_df.groupby('Id', sort=False).Rank.idxmax()]

piRSquared Over a year ago

You tell me! Run combined_ranking_df.dtypes and see if 'Rank' is float

piRSquared Over a year ago

If not, run this instead combined_ranking_df.loc[combined_ranking_df.groupby('Id', sort=False).Rank.astype(float).idxmax()]

MaxU - stand with Ukraine · Accepted Answer · 2017-07-12 18:10:47Z

5

IIUC:

In [40]: df.groupby('Id', as_index=False, sort=False) \
           .apply(lambda x: x.nlargest(1, ['Rank'])) \
    ...:   .reset_index(level=1, drop=True)
Out[40]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
2  24259   6.0       WIP
3  14251   8.0  deployed
4  14250   6.0       WIP

or a nicer version from @piRSquared:

In [41]: df.groupby('Id', group_keys=False, sort=False) \
           .apply(pd.DataFrame.nlargest, n=1, columns='Rank')
Out[41]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

edited Jul 12, 2017 at 18:10

answered Jul 12, 2017 at 17:30

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

3 Comments

piRSquared Over a year ago

df.groupby('Id', group_keys=False, sort=False).apply(pd.DataFrame.nlargest, n=1, columns='Rank')

MaxU - stand with Ukraine Over a year ago

@piRSquared, i totally forgot about group_keys parameter - thanks a lot!

piRSquared Over a year ago

I've got your back!

cs95 · Accepted Answer · 2017-07-12 17:36:29Z

4

Try storing it and then consult that stored groupedby

groups = combined_ranking_df.groupby(['Id'], as_index=False, sort=False).max()[['Id','Rank']].

      Id  Rank
0  14035   8.0
1  47728   8.0
2  24259   6.0
3  14251   8.0
4  14250   6.0

edited Jul 12, 2017 at 17:36

cs95

406k106 gold badges744 silver badges797 bronze badges

answered Jul 12, 2017 at 17:29

Diego Aguado

1,61618 silver badges37 bronze badges

1 Comment

cs95 Over a year ago

sort=False parameter to get OP's output

Alexander · Accepted Answer · 2017-07-12 17:53:31Z

3

You can create a boolean index to check if the Rank for a given Id equals its max value. Then use boolean indexing to extract the max values from the dataframe.

The mask is created using a groupby on Id with the help of transform, which preserves the original dimensions of the dataframe.

>>> df[(df[['Rank']] == df[['Id', 'Rank']].groupby('Id').transform(max)).squeeze().tolist()]
      Id  Rank  Activity
0  14035     8  deployed
1  47728     8  deployed
3  24259     6       WIP
4  14251     8  deployed
6  14250     6       WIP

edited Jul 12, 2017 at 17:53

answered Jul 12, 2017 at 17:50

Alexander

111k32 gold badges212 silver badges208 bronze badges

2 Comments

piRSquared Over a year ago

I've learned so much from your answers... I wish I saw them more often (-:

MaxU - stand with Ukraine Over a year ago

Very interesting and unusual approach - i really like it!

Collectives™ on Stack Overflow

Finding the max value in Python Column

4 Answers 4

3 Comments

3 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related