how to apply Functions on numpy arrays using pandas groupby function

Question

I'm very new to pandas so I hope this will have an easy answer (and I also appreciate all pointers to even the setup of the dataframe)

So let's say I have the following DataFrame:

D = pd.DataFrame({ i:{ "name":str(i),
                       "vector": np.arange(i,i+10),
                       "sq":i**2,
                       "gp":i%3 } for i in range(10) }).T

    gp  name sq  vector
0    0   0   0   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1    1   1   1   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
2    2   2   4   [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
3    0   3   9   [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
4    1   4   16  [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5    2   5   25  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
6    0   6   36  [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
7    1   7   49  [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
8    2   8   64  [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
9    0   9   81  [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Now I would like to group by "gp" and get the mean of the "vector"

I've tried

D.groupby('gp').mean()

and even

D.groupby('gp').agg( np.mean )

but I get an error that there were no "numeric types" to be aggregated. So do np.arrays not work in pandas?

FooBar · Accepted Answer · 2014-05-27 08:15:45Z

3

For me it works:

D.groupby('gp').apply(lambda x: x.vector.mean().mean())

I'm taking the mean twice, since you want the mean group value for the mean of the vector (don't you?).

Out[98]: 
gp
0     9.0
1     8.5
2     9.5
dtype: float64

If you want the mean vector, just take the mean once.

answered May 27, 2014 at 8:15

FooBar

16.7k20 gold badges94 silver badges188 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Magellan88 Over a year ago

Well no, I actually wanted the mean vector (but that's on me I did not ask the question precise enough). Still your solution works well for me (using only one mean), so thanks.

HYRY · Accepted Answer · 2014-05-27 08:22:32Z

2

arrays in cell is not a good idea, you can convert the vector col to multi cols:

D = pd.DataFrame({ i:{ "name":str(i),
                       "vector": np.arange(i,i+10),
                       "sq":i**2,
                       "gp":i%3 } for i in range(10) }).T
df = pd.concat([D[["gp", "name", "sq"]], pd.DataFrame(D.vector.tolist(), index=D.index)], axis=1, keys=["attrs", "vector"])
print df.groupby([("attrs", "gp")]).mean()

here is the output:

                  vector                                                  
                  0    1    2    3    4     5     6     7     8     9
(attrs, gp)                                                          
0               4.5  5.5  6.5  7.5  8.5   9.5  10.5  11.5  12.5  13.5
1               4.0  5.0  6.0  7.0  8.0   9.0  10.0  11.0  12.0  13.0
2               5.0  6.0  7.0  8.0  9.0  10.0  11.0  12.0  13.0  14.0

answered May 27, 2014 at 8:22

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

1 Comment

Magellan88 Over a year ago

Woo, that looks sophisticated. I guess I'll have to digest this for a little while. I guess this is one of these concepts that look very difficult at first but turn out to be extremely useful and pythonic once you understand them, right? Well my main problem saidly is more complicated, since the "vector" is supposed to be a 3d array of the same size for every row of which I want to have the mean as grouped together by different criteria.

Collectives™ on Stack Overflow

how to apply Functions on numpy arrays using pandas groupby function

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related