2

I have these numpy arrays:

array1 = np.array([-1, -1, 1, 1, 2, 1, 2, 2])
array2 = np.array([34.2, 11.2, 22.1, 78.2, 55.0, 66.87, 33.3, 11.56])

Now I want to return a 2d array in which there is the mean for each distinctive value from array1 so my output would look something like this:

array([[-1, 22.7],
       [ 1, 55.7],
       [ 2, 33.3]])

Is there an efficient way without concatenating those 1D arrays to one 2D array? Thanks!

0

2 Answers 2

3

This is a typical grouping operation, and the numpy_indexed package (disclaimer: I am its author) provides extensions to numpy to perform these type of operations efficiently and concisely:

import numpy_indexed as npi
groups, means = npi.group_by(array_1).mean(array_2)

Note that you can in this manner easily perform other kind of reductions as well, such as a median for example.

Sign up to request clarification or add additional context in comments.

Comments

1

Here's an approach using np.unique and np.bincount -

# Get unique array1 elems, tag them starting from 0 and get their tag counts
unq,ids,count = np.unique(array1,return_inverse=True,return_counts=True)

# Use the tags/IDs to perform ID based summation of array2 elems and 
# thus divide by the ID counts to get ID based average values
out = np.column_stack((unq,np.bincount(ids,array2)/count))

Sample run -

In [16]: array1 = np.array([-1, -1, 1, 1, 2, 1, 2, 2])
    ...: array2 = np.array([34.2, 11.2, 22.1, 78.2, 55.0, 66.87, 33.3, 11.56])
    ...: 

In [18]: out
Out[18]: 
array([[ -1.        ,  22.7       ],
       [  1.        ,  55.72333333],
       [  2.        ,  33.28666667]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.