4

Assuming an (1-d) array, is it possible to calculate the average on given groups of diifferent size without looping? Instead of

avgs = [One_d_array[groups[i]].mean() for i in range(len(groups))]

Something like

avgs = np.mean(One_d_array, groups)

Basically I want to do this:

M = np.arange(10000)
np.random.shuffle(M)
M.resize(100,100)
groups = np.random.randint(1, 10, 100)

def means(M, groups):
    means = []
    for i, label in enumerate(groups):
        means.extend([M[i][groups == j].mean() for j in set(p).difference([label])])
    return means

This runs at

%timeit means(M, groups)
100 loops, best of 3: 12.2 ms per loop

Speed up of 10 times or so would be already great

2 Answers 2

1

Whether you see a loop or not, there is a loop.
Here's one way, but the loop is simply hidden in the call to map:

In [10]: import numpy as np

In [11]: groups = [[1,2],[3,4,5]]

In [12]: map(np.mean, groups)
Out[12]: [1.5, 4.0]
Sign up to request clarification or add additional context in comments.

3 Comments

I guess your're right. However, is there any way to make calculation of mean faster? It's a bottleneck in several functions I'm dealing with..
A NumPy array can be fast when you apply NumPy functions such a np.mean to a single large array. NumPy may not be very fast if you have to call np.mean on lots of small arrays. If you can't arrange your data into a single large array (maybe because the rows have different lengths) then you may be better off using plain Python lists than lots of small NumPy arrays. (It's hard to tell -- you have to benchmark with timeit.)
If you are computing the mean of groups over and over again (with small changes to groups in between iterations) then it would be smart to keep running totals of the sum of each item in groups. That way you can update the totals as groups changes, and it is easy and quick to compute the new means.
0

Another hidden loop is the use of np.vectorize:

>>> x = np.array([1,2,3,4,5])
>>> groups = [[0,1,2], [3,4]]
>>> np.vectorize(lambda group: np.mean(x[group]), otypes=[float])(groups)
array([ 2. , 4.5])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.