1

I have something similar to the following dataframe, df:

df
    full_name       team    rec_yards
0   Michael Thomas   NO        1688      
1   Chris Godwin     NO        1333      
2   DeAndre Hopkins  NO        1165     
3   Julio Jones      NO        1316    
4   Cooper Kupp      NO        800  
5   Adam Thielen     LA        1165     
6   Julian Edelman   LA        1316    
7   Stefon Diggs     LA        1062
8   Alshon Jeffery   LA        1250   
9   AJ Green         LA        800
...

For each team I would like to keep the top four full_name according to rec_yards. Is there a way I can loop through each team to do this?

This is my desired output:

    full_name       team    rec_yards
0   Michael Thomas   NO        1688      
1   Chris Godwin     NO        1333
2   Julio Jones      NO        1316          
3   DeAndre Hopkins  NO        1165            
4   Julian Edelman   LA        1316    
5   Alshon Jeffery   LA        1250 
6   Adam Thielen     LA        1165 
7   Stefon Diggs     LA        1062

This output keeps the top four rec_yards values for each team, sorts them, and drops any rows that aren't in the top 4 for each team

What I tried:

I tried turning the df into a multi-level index by using

filter.set_index(['team', 'full_name']).sort_index()

But rec_yards isn't sorted in any particular order. Again, I would just like to keep the top 4 full_name for each team according to rec_yards values. Is a multi-level index the way to go here? Note: the original df is not the multi-level index as someone thought. That is simply the original df.

EDIT: Here is a dictionary of my df

{'full_name': ['Michael Thomas',
  'Chris Godwin',
  'DeAndre Hopkins',
  'Julio Jones',
  'Cooper Kupp',
  'Julian Edelman',
  'Kenny Golladay',
  'Keenan Allen',
  'Amari Cooper',
  'D.J. Moore',
  'Mike Evans',
  'DeVante Parker',
  'Jarvis Landry',
  'Stefon Diggs',
  'Tyler Lockett',
  'Tyler Boyd',
  'John Brown',
  'Robert Woods',
  'Courtland Sutton',
  'Calvin Ridley',
  'A.J. Brown',
  'Terry McLaurin',
  'Davante Adams',
  'Cole Beasley',
  'Michael Gallup',
  'Tyreek Hill',
  'Jamison Crowder',
  'Larry Fitzgerald',
  'Curtis Samuel',
  'Deebo Samuel',
  'Darius Slayton',
  'Mike Williams',
  'Mike Williams',
  'Robby Anderson',
  'Christian Kirk',
  'Diontae Johnson',
  'Chris Conley',
  'Randall Cobb',
  'Tyrell Williams',
  'Marquise Brown',
  'Sammy Watkins',
  'Golden Tate',
  'James Washington',
  'Dede Westbrook',
  'Danny Amendola',
  'Sterling Shepard',
  'Zach Pascal',
  'Anthony Miller',
  'Alshon Jeffery',
  'Kenny Stills',
  'T.Y. Hilton',
  'Adam Thielen',
  'Breshad Perriman',
  'Mecole Hardman',
  'JuJu Smith-Schuster',
  'Hunter Renfrow',
  'Brandin Cooks',
  'Corey Davis',
  'Auden Tate',
  'Emmanuel Sanders',
  'Emmanuel Sanders',
  'Alex Erickson',
  'Kendrick Bourne',
  'Nelson Agholor',
  'Preston Williams',
  'Demarcus Robinson',
  'Taylor Gabriel',
  'Russell Gage',
  'Adam Humphries',
  'Allen Lazard',
  'Allen Hurns',
  'Demaryius Thomas',
  'Marquez Valdes-Scantling',
  'Albert Wilson',
  'Bisi Johnson',
  'Geronimo Allison',
  'Paul Richardson',
  'Keelan Cole',
  'Cody Latimer',
  'Jakobi Meyers',
  'Josh Reynolds',
  'Kelvin Harmon',
  'Seth Roberts',
  'Isaiah McKenzie',
  'David Moore',
  'Josh Gordon',
  'Josh Gordon',
  "Tre'Quan Smith",
  'Jarius Wright',
  'Damiere Byrd',
  'Keke Coutee',
  'DaeSean Hamilton',
  'Pharoh Cooper',
  'Trey Quinn',
  'Miles Boykin',
  'Jaron Brown',
  'Chester Rogers',
  'Tavon Austin',
  'KeeSean Johnson',
  'Malik Turner',
  'Bennie Fowler',
  'Vyncint Smith',
  'Jakeem Grant',
  'Parris Campbell',
  'Marvin Hall',
  'Jake Kumerow',
  'Justin Watson',
  'Marquise Goodwin',
  'Javon Wims',
  'DeSean Jackson',
  'Andy Isabella',
  'Tim Patrick',
  'Byron Pringle',
  'Dante Pettis',
  'Laquon Treadwell',
  'J.J. Arcega-Whiteside',
  "N'Keal Harry",
  'Justin Hardy',
  'Kalif Raymond',
  'Zay Jones',
  'Zay Jones',
  'Trevor Davis',
  'Trevor Davis',
  'Trevor Davis',
  'Cordarrelle Patterson',
  'Keelan Doss',
  'Damion Ratley',
  'Mack Hollins',
  'Mack Hollins',
  'Devin Smith',
  'Dontrelle Inman',
  'Dontrelle Inman',
  'Christian Blake',
  'Duke Williams',
  'Antonio Callaway',
  'Olamide Zaccheaus',
  'Rashard Higgins',
  'Braxton Berrios',
  'DeAndre Carter',
  'Robert Foster',
  'Deon Cain',
  'Deon Cain',
  'Trent Sherfield',
  'Andre Patton',
  'Chris Hogan',
  'Ryan Switzer',
  'Diontae Spencer',
  'Deonte Harris',
  'Chad Beebe',
  'Marcell Ateman',
  'Travis Benjamin',
  'KhaDarel Hodge',
  'Ventell Bryant',
  'Geremy Davis',
  'Jason Moore',
  'Devin Funchess',
  'Johnny Holton',
  'Cody Core',
  'Donte Moncrief',
  'Andre Roberts',
  'Russell Shepard',
  'Gunner Olszewski',
  'Ryan Grant',
  'C.J. Board',
  'Chris Moore',
  'Marqise Lee',
  'Stanley Morgan',
  'Riley Ridley',
  'Fred Brown',
  'DeAndrew White',
  'Josh Bellamy',
  'Ashton Dulin',
  'Michael Walker',
  'Mike Thomas',
  'Brandon Zylstra',
  'Austin Carr',
  'Dwayne Harris',
  'Krishawn Hogan',
  'Quincy Enunwa',
  'Greg Dortch',
  'JoJo Natson',
  'Juwann Winfree',
  'Matthew Slater',
  'Taywan Taylor'],
 'rec_yards': [1688,
  1333,
  1165,
  1316,
  1062,
  1091,
  1118,
  1117,
  1097,
  1175,
  1157,
  1065,
  1092,
  1130,
  1006,
  987,
  1060,
  1067,
  1060,
  866,
  927,
  919,
  904,
  778,
  1009,
  799,
  767,
  759,
  614,
  700,
  690,
  963,
  963,
  761,
  649,
  626,
  737,
  747,
  651,
  569,
  665,
  608,
  735,
  588,
  662,
  537,
  597,
  651,
  490,
  561,
  429,
  418,
  511,
  508,
  546,
  503,
  543,
  557,
  575,
  477,
  367,
  513,
  358,
  363,
  428,
  425,
  353,
  378,
  374,
  408,
  416,
  433,
  433,
  292,
  260,
  270,
  245,
  294,
  288,
  359,
  326,
  332,
  271,
  247,
  271,
  287,
  139,
  178,
  286,
  285,
  247,
  232,
  219,
  198,
  198,
  220,
  179,
  176,
  187,
  245,
  193,
  189,
  164,
  127,
  261,
  212,
  132,
  186,
  163,
  159,
  189,
  204,
  170,
  109,
  184,
  169,
  76,
  155,
  170,
  126,
  69,
  83,
  28,
  0,
  83,
  133,
  136,
  125,
  0,
  113,
  132,
  43,
  91,
  58,
  89,
  93,
  55,
  104,
  97,
  64,
  72,
  52,
  80,
  56,
  53,
  27,
  31,
  24,
  70,
  70,
  30,
  57,
  15,
  38,
  43,
  32,
  21,
  28,
  18,
  20,
  25,
  34,
  14,
  31,
  21,
  18,
  18,
  15,
  21,
  20,
  20,
  17,
  15,
  14,
  10,
  9,
  7,
  4,
  -4,
  0,
  0,
  0,
  0,
  0],
 'team': ['NO',
  'TB',
  'HOU',
  'ATL',
  'LA',
  'NE',
  'DET',
  'LAC',
  'DAL',
  'CAR',
  'TB',
  'MIA',
  'CLE',
  'MIN',
  'SEA',
  'CIN',
  'BUF',
  'LA',
  'DEN',
  'ATL',
  'TEN',
  'WAS',
  'GB',
  'BUF',
  'DAL',
  'KC',
  'NYJ',
  'ARI',
  'CAR',
  'SF',
  'NYG',
  'LAC',
  'LAC',
  'NYJ',
  'ARI',
  'PIT',
  'JAC',
  'DAL',
  'OAK',
  'BAL',
  'KC',
  'NYG',
  'PIT',
  'JAC',
  'DET',
  'NYG',
  'IND',
  'CHI',
  'PHI',
  'HOU',
  'IND',
  'MIN',
  'TB',
  'KC',
  'PIT',
  'OAK',
  'LA',
  'TEN',
  'CIN',
  'SF',
  'DEN',
  'CIN',
  'SF',
  'PHI',
  'MIA',
  'KC',
  'CHI',
  'ATL',
  'TEN',
  'GB',
  'MIA',
  'NYJ',
  'GB',
  'MIA',
  'MIN',
  'GB',
  'WAS',
  'JAC',
  'NYG',
  'NE',
  'LA',
  'WAS',
  'BAL',
  'BUF',
  'SEA',
  'NE',
  'SEA',
  'NO',
  'CAR',
  'ARI',
  'HOU',
  'DEN',
  'ARI',
  'WAS',
  'BAL',
  'SEA',
  'IND',
  'DAL',
  'ARI',
  'SEA',
  'NYG',
  'NYJ',
  'MIA',
  'IND',
  'DET',
  'GB',
  'TB',
  'SF',
  'CHI',
  'PHI',
  'ARI',
  'DEN',
  'KC',
  'SF',
  'MIN',
  'PHI',
  'NE',
  'ATL',
  'TEN',
  'OAK',
  'BUF',
  'OAK',
  'GB',
  'MIA',
  'CHI',
  'OAK',
  'CLE',
  'PHI',
  'MIA',
  'DAL',
  'LAC',
  'IND',
  'ATL',
  'BUF',
  'CLE',
  'ATL',
  'CLE',
  'NYJ',
  'HOU',
  'BUF',
  'PIT',
  'IND',
  'ARI',
  'LAC',
  'CAR',
  'PIT',
  'DEN',
  'NO',
  'MIN',
  'OAK',
  'LAC',
  'CLE',
  'DAL',
  'LAC',
  'LAC',
  'IND',
  'PIT',
  'NYG',
  'PIT',
  'BUF',
  'NYG',
  'NE',
  'OAK',
  'JAC',
  'BAL',
  'JAC',
  'CIN',
  'CHI',
  'DEN',
  'CAR',
  'NYJ',
  'IND',
  'JAC',
  'LA',
  'CAR',
  'NO',
  'OAK',
  'NO',
  'NYJ',
  'CAR',
  'LA',
  'DEN',
  'NE',
  'CLE']}
5
  • kindly post the expected output. Also, this doesnt look like a multiIndex. if it is, then sharing the data in a dictionary format will be much easier to reproduce Commented Aug 15, 2020 at 23:17
  • I will edit the post. Commented Aug 15, 2020 at 23:18
  • Check again @sammywemmy, I hope this is explained better. Commented Aug 15, 2020 at 23:24
  • It would be easier for us if you showed your source dataframe contents as a dictionary. You can do that using this: df.to_dict('list') Commented Aug 15, 2020 at 23:25
  • Posted the dictionary of my df @Bill Commented Aug 15, 2020 at 23:29

2 Answers 2

2

I think this should work:

# Make a sorted multi-index
df_sorted = df.set_index(['team','rec_yards']).sort_index(ascending=False)

# Add a dummy column containing all 1s
df_sorted['count'] = 1

# Turn it into a ranking by team
df_sorted['count'] = df_sorted.groupby('team')['count'].cumsum()       

# Only keep the first n rows per team
n = 4
df_sorted = df_sorted.loc[df_sorted['count'] <= n]

# Re-arrange it however you prefer
df_sorted.reset_index().set_index(['team','count'])

Output:

            rec_yards        full_name
team count                            
WAS  1            919   Terry McLaurin
     2            332    Kelvin Harmon
     3            245  Paul Richardson
     4            198       Trey Quinn
TEN  1            927       A.J. Brown
     2            557      Corey Davis
     3            374   Adam Humphries
     4            170    Kalif Raymond
TB   1           1333     Chris Godwin
     2           1157       Mike Evans
Sign up to request clarification or add additional context in comments.

Comments

2

You can also use the head method with df.groupby:

df.sort_values(['team','rec_yards'], ascending=False).groupby('team').head(4)

Output:

           full_name  rec_yards team
21    Terry McLaurin        919  WAS
81     Kelvin Harmon        332  WAS
76   Paul Richardson        245  WAS
93        Trey Quinn        198  WAS
20        A.J. Brown        927  TEN
57       Corey Davis        557  TEN
68    Adam Humphries        374  TEN
118    Kalif Raymond        170  TEN
1       Chris Godwin       1333   TB
10        Mike Evans       1157   TB

(It might be a good idea to re-index the dataframe after this).

The best solution would be if there were a way to use pd.DataFrame.nlargest with a DataFrame groupby but as far as I can tell you can't (there is a SeriesGroupBy.nlargest but not a DataFrameGroupBy.nlargest). You can do it by using DataFrame.nlargest with the GroupBy.apply method but I tested it and it is quite inefficient:

df.groupby('team').apply(pd.DataFrame.nlargest, 4, columns=['rec_yards'])

Output:

                 full_name  rec_yards team  count
team                                             
ARI  27   Larry Fitzgerald        759  ARI      1
     34     Christian Kirk        649  ARI      1
     89       Damiere Byrd        285  ARI      1
     92      Pharoh Cooper        219  ARI      1
ATL  3         Julio Jones       1316  ATL      1
     19      Calvin Ridley        866  ATL      1
     67       Russell Gage        378  ATL      1
     117      Justin Hardy        155  ATL      1
BAL  39     Marquise Brown        569  BAL      1
     82       Seth Roberts        271  BAL      1

Note: The second level of the multi-index here is the original index value in df.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.