Given a multi indexed Pandas DataFrame containing Numpy arrays, I would like to know how to get mean values for each columns for a given index level.
>>> pd.__version__
'1.0.5'
>>> a = np.array(range(20)).reshape(-1,2)
>>> d = pd.concat([pd.DataFrame({(i%len(a)//2,i%2): {'a': np.array(v), 'b': np.array([4,4])}}).T for i, v in enumerate(a)])
>>> d
a b
0 0 [0, 1] [4, 4]
1 [2, 3] [4, 4]
1 0 [4, 5] [4, 4]
1 [6, 7] [4, 4]
2 0 [8, 9] [4, 4]
1 [10, 11] [4, 4]
3 0 [12, 13] [4, 4]
1 [14, 15] [4, 4]
4 0 [16, 17] [4, 4]
1 [18, 19] [4, 4]
>>> d['a'].mean()
array([ 9., 10.])
>>> d['b'].mean()
array([4., 4.])
So far so good.
Problem
The problem comes when I want to perform .mean() on all columns or on a given level of the index.
Getting the mean of the DataFrame instead of the d[<column>] Series, we only get the mean for the 1st element in the numpy arrays
>>> d.mean()
a 9.0
b 4.0
Name: 0, dtype: float64
And we get errors when trying specific index levels
>>> d.mean(level=0)
Traceback (most recent call last):
[ ... ]
pandas.core.base.DataError: No numeric types to aggregate
>>> d['a'].mean(level=1)
Traceback (most recent call last):
[ ... ]
pandas.core.base.DataError: No numeric types to aggregate
Expected output
>>> d.mean()
a [9., 10.]
b [4., 4.]
>>> d.mean(level=0)
a b
0 [1, 2] [4, 4]
1 [5, 6] [4, 4]
2 [9, 10] [4, 4]
3 [13, 14] [4, 4]
4 [17, 18] [4, 4]
>>> d['a'].mean(level=1)
0 [8, 9]
1 [10, 11]
I know that Pandas doesn't pretend to handle Numpy arrays very well, but it looks like a Pandas bug to me, but I'd like to know how to work around it?