I regularly get in a scenario where I have a dataframe with a MultiIndex with 3 levels. I then reduce that dataframe to two levels (for instance, to get the mean or the size of a level) and make a subselection of those means, for instance.
I just can't get this to work. I have tried slicing, loc (but that gives an error), etc. but I cannot get this to work.
How do you do this? Example:
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict({'Alpha': 'a a b b c'.split(),
'Word': 'one one three two three'.split(),
'AnotherWord':'alpha alpa beta bèta gamma'.split(),
'Random1': list(np.random.randint(0,20,5)),
'Random2':list(np.random.randint(0,200,5)),
'Random3':list(np.random.randint(0,100,5))}
)
df1.set_index(['Alpha', 'Word', 'AnotherWord'], inplace=True)
>>> df1
Random1 Random2 Random3
Alpha Word AnotherWord
a one alpha 9 123 34
alpa 18 9 77
b three beta 10 110 33
two bèta 11 153 88
c three gamma 9 130 6
filtered = df1.groupby(['Alpha', 'Word']).size()
>>> filtered
Alpha Word
a one 2
b three 1
two 1
c three 1
dtype: int64
Now I want to filter on filtered == 1:
Result should be:
Random1 Random2 Random3
Alpha Word AnotherWord
b three beta 10 110 33
two bèta 11 153 88
c three gamma 9 130 6
In this case I have no performed any filtering, but I do want to add the data to the df1.