Use groupby/agg to group by both the team and grade, and the aggregate the cost, area and size columns. Note that agg can accept a dict whose keys are column names and whose values are aggregation functions (such as mean or sum). Thus you can specify aggregation functions on a per-column basis.
In [120]: df.groupby(['team', 'grade']).agg({'cost':'mean', 'area':'mean', 'size':'sum'}).rename(columns={'cost':'mean_cost', 'area':'mean_area'})
Out[120]:
size mean_cost mean_area
team grade
arsenal 1 973 62000.000000 46.000000
2 3110 40466.666667 58.666667
chelsea 3 957 52000.000000 42.000000
4 849 62000.000000 30.000000
man utd 1 2445 56950.000000 12.500000
2 2579 47650.000000 66.000000
groupby returns an iterable. Therefore, to make a dict mapping team names to DataFrames you could use:
dfs = {team:grp for team, grp in result.reset_index().groupby('team')}
For example,
import pandas as pd
df = pd.DataFrame(
{'area': [5, 42, 20, 20, 43, 78, 89, 30, 46, 78],
'cost': [52300, 52000, 25000, 61600, 43000, 23400, 52300, 62000, 62000, 73000],
'grade': [1, 3, 2, 1, 2, 2, 2, 4, 1, 2], 'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005],
'team': ['man utd', 'chelsea', 'arsenal', 'man utd', 'man utd', 'arsenal', 'man utd', 'chelsea', 'arsenal', 'arsenal']})
result = df.groupby(['team', 'grade']).agg({'cost':'mean', 'area':'mean', 'size':'sum'}).rename(columns={'cost':'mean_cost', 'area':'mean_area'})
dfs = {team:grp.drop('team', axis=1)
for team, grp in result.reset_index().groupby('team')}
for team, grp in dfs.items():
print('{}:\n{}\n'.format(team, grp))
yields
chelsea:
grade mean_cost mean_area size
2 3 52000 42 957
3 4 62000 30 849
arsenal:
grade mean_cost mean_area size
0 1 62000.000000 46.000000 973
1 2 40466.666667 58.666667 3110
man utd:
grade mean_cost mean_area size
4 1 56950 12.5 2445
5 2 47650 66.0 2579
Beware that for better performance try to avoid breaking up
DataFrames into smaller DataFrames, because once you use a dict or a list, you
are forced to use Python loops instead of the faster implicit C-compiled loops
used by Pandas/NumPy methods.
So for computation try to stick with the result DataFrame. Use the dfs dict
only if you have to do something like print the DataFrames separately.