-1

I'm trying to do something well beyond my Pandas level and have spent far too much time getting this wrong. In this example I need to return individual Dataframes for each of the teams. The dataframes would show the mean cost, mean area, and sum of size, for each grade.

Because I need to produce separate tables, I probably need to pass single team names into a function over and over. To be clear, I'm happy to pass the team names into a function (or similar) manually to produce each table.

    team      grade   cost   area   size
0   man utd   1       52300  5      1045
1   chelsea   3       52000  42     957
2   arsenal   2       25000  20     1099
3   man utd   1       61600  20     1400
4   man utd   2       43000  43     1592
5   arsenal   2       23400  78     1006
6   man utd   2       52300  89     987
7   chelsea   4       62000  30     849
8   arsenal   1       62000  46     973
9   arsenal   2       73000  78     1005

The man utd dataframe would look like this for example:

grade  mean_cost    mean_area   size
1      56590        12.5        2445
2      47650        66          2579
1
  • 1
    And...? What is your question? Surely you have code, but you're stuck somewhere, so give us something specific to help you with. Commented Apr 18, 2016 at 21:03

1 Answer 1

1

Use groupby/agg to group by both the team and grade, and the aggregate the cost, area and size columns. Note that agg can accept a dict whose keys are column names and whose values are aggregation functions (such as mean or sum). Thus you can specify aggregation functions on a per-column basis.

In [120]: df.groupby(['team', 'grade']).agg({'cost':'mean', 'area':'mean', 'size':'sum'}).rename(columns={'cost':'mean_cost', 'area':'mean_area'})
Out[120]: 
               size     mean_cost  mean_area
team    grade                               
arsenal 1       973  62000.000000  46.000000
        2      3110  40466.666667  58.666667
chelsea 3       957  52000.000000  42.000000
        4       849  62000.000000  30.000000
man utd 1      2445  56950.000000  12.500000
        2      2579  47650.000000  66.000000

groupby returns an iterable. Therefore, to make a dict mapping team names to DataFrames you could use:

dfs = {team:grp for team, grp in result.reset_index().groupby('team')}

For example,

import pandas as pd

df = pd.DataFrame(
    {'area': [5, 42, 20, 20, 43, 78, 89, 30, 46, 78],
     'cost': [52300, 52000, 25000, 61600, 43000, 23400, 52300, 62000, 62000, 73000], 
     'grade': [1, 3, 2, 1, 2, 2, 2, 4, 1, 2], 'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005], 
     'team': ['man utd', 'chelsea', 'arsenal', 'man utd', 'man utd', 'arsenal', 'man utd', 'chelsea', 'arsenal', 'arsenal']})

result =  df.groupby(['team', 'grade']).agg({'cost':'mean', 'area':'mean', 'size':'sum'}).rename(columns={'cost':'mean_cost', 'area':'mean_area'})

dfs = {team:grp.drop('team', axis=1) 
       for team, grp in result.reset_index().groupby('team')}

for team, grp in dfs.items():
    print('{}:\n{}\n'.format(team, grp))

yields

chelsea:
   grade  mean_cost  mean_area  size
2      3      52000         42   957
3      4      62000         30   849

arsenal:
   grade     mean_cost  mean_area  size
0      1  62000.000000  46.000000   973
1      2  40466.666667  58.666667  3110

man utd:
   grade  mean_cost  mean_area  size
4      1      56950       12.5  2445
5      2      47650       66.0  2579

Beware that for better performance try to avoid breaking up DataFrames into smaller DataFrames, because once you use a dict or a list, you are forced to use Python loops instead of the faster implicit C-compiled loops used by Pandas/NumPy methods.

So for computation try to stick with the result DataFrame. Use the dfs dict only if you have to do something like print the DataFrames separately.

Sign up to request clarification or add additional context in comments.

4 Comments

Most insightful and informative answer I've ever had on Stack Overflow. Thanks.
Is there also a way to print the dataframes individually in the pandas table style. Ideally need to screenshot them for presentation purposes.
To help build Stackoverflow's stable of good questions paired with good answers, please ask this as a new question.
Agreed: question is here

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.