NumPy: Calculate mean of certain elements in array

Question

Assuming an (1-d) array, is it possible to calculate the average on given groups of diifferent size without looping? Instead of

avgs = [One_d_array[groups[i]].mean() for i in range(len(groups))]

Something like

avgs = np.mean(One_d_array, groups)

Basically I want to do this:

M = np.arange(10000)
np.random.shuffle(M)
M.resize(100,100)
groups = np.random.randint(1, 10, 100)

def means(M, groups):
    means = []
    for i, label in enumerate(groups):
        means.extend([M[i][groups == j].mean() for j in set(p).difference([label])])
    return means

This runs at

%timeit means(M, groups)
100 loops, best of 3: 12.2 ms per loop

Speed up of 10 times or so would be already great

unutbu · Accepted Answer · 2014-01-03 17:14:21Z

1

Whether you see a loop or not, there is a loop.
Here's one way, but the loop is simply hidden in the call to map:

In [10]: import numpy as np

In [11]: groups = [[1,2],[3,4,5]]

In [12]: map(np.mean, groups)
Out[12]: [1.5, 4.0]

answered Jan 3, 2014 at 17:14

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

embert Over a year ago

I guess your're right. However, is there any way to make calculation of mean faster? It's a bottleneck in several functions I'm dealing with..

unutbu Over a year ago

A NumPy array can be fast when you apply NumPy functions such a np.mean to a single large array. NumPy may not be very fast if you have to call np.mean on lots of small arrays. If you can't arrange your data into a single large array (maybe because the rows have different lengths) then you may be better off using plain Python lists than lots of small NumPy arrays. (It's hard to tell -- you have to benchmark with timeit.)

unutbu Over a year ago

If you are computing the mean of groups over and over again (with small changes to groups in between iterations) then it would be smart to keep running totals of the sum of each item in groups. That way you can update the totals as groups changes, and it is easy and quick to compute the new means.

val · Accepted Answer · 2014-01-03 17:24:16Z

0

Another hidden loop is the use of np.vectorize:

>>> x = np.array([1,2,3,4,5])
>>> groups = [[0,1,2], [3,4]]
>>> np.vectorize(lambda group: np.mean(x[group]), otypes=[float])(groups)
array([ 2. , 4.5])

answered Jan 3, 2014 at 17:24

val

8,72935 silver badges34 bronze badges

Collectives™ on Stack Overflow

NumPy: Calculate mean of certain elements in array

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related