4

I'm very new to pandas so I hope this will have an easy answer (and I also appreciate all pointers to even the setup of the dataframe)

So let's say I have the following DataFrame:

D = pd.DataFrame({ i:{ "name":str(i),
                       "vector": np.arange(i,i+10),
                       "sq":i**2,
                       "gp":i%3 } for i in range(10) }).T

    gp  name sq  vector
0    0   0   0   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1    1   1   1   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
2    2   2   4   [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
3    0   3   9   [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
4    1   4   16  [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5    2   5   25  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
6    0   6   36  [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
7    1   7   49  [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
8    2   8   64  [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
9    0   9   81  [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Now I would like to group by "gp" and get the mean of the "vector"

I've tried

D.groupby('gp').mean()

and even

D.groupby('gp').agg( np.mean )

but I get an error that there were no "numeric types" to be aggregated. So do np.arrays not work in pandas?

2 Answers 2

3

For me it works:

D.groupby('gp').apply(lambda x: x.vector.mean().mean())

I'm taking the mean twice, since you want the mean group value for the mean of the vector (don't you?).

Out[98]: 
gp
0     9.0
1     8.5
2     9.5
dtype: float64

If you want the mean vector, just take the mean once.

Sign up to request clarification or add additional context in comments.

1 Comment

Well no, I actually wanted the mean vector (but that's on me I did not ask the question precise enough). Still your solution works well for me (using only one mean), so thanks.
2

arrays in cell is not a good idea, you can convert the vector col to multi cols:

D = pd.DataFrame({ i:{ "name":str(i),
                       "vector": np.arange(i,i+10),
                       "sq":i**2,
                       "gp":i%3 } for i in range(10) }).T
df = pd.concat([D[["gp", "name", "sq"]], pd.DataFrame(D.vector.tolist(), index=D.index)], axis=1, keys=["attrs", "vector"])
print df.groupby([("attrs", "gp")]).mean()

here is the output:

                  vector                                                  
                  0    1    2    3    4     5     6     7     8     9
(attrs, gp)                                                          
0               4.5  5.5  6.5  7.5  8.5   9.5  10.5  11.5  12.5  13.5
1               4.0  5.0  6.0  7.0  8.0   9.0  10.0  11.0  12.0  13.0
2               5.0  6.0  7.0  8.0  9.0  10.0  11.0  12.0  13.0  14.0

1 Comment

Woo, that looks sophisticated. I guess I'll have to digest this for a little while. I guess this is one of these concepts that look very difficult at first but turn out to be extremely useful and pythonic once you understand them, right? Well my main problem saidly is more complicated, since the "vector" is supposed to be a 3d array of the same size for every row of which I want to have the mean as grouped together by different criteria.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.