Average using grouping value in another vector (numpy / Python)

Question

I'd like to take the average of one vector based on grouping information in another vector. The two vectors are the same length. I've created a minimal example below based on averaging predictions for each user. How do I do that in NumPy?

       >>> pred
           [ 0.99  0.23  0.11  0.64  0.45  0.55 0.76  0.72  0.97 ] 
       >>> users
           ['User2' 'User3' 'User2' 'User3' 'User0' 'User1' 'User4' 'User4' 'User4']

Your two arrays are different lengths... Also are you looking for a solution in NumPy or (a much easier solution) in Pandas? — Alex Riley
– Alex Riley, Commented Mar 24, 2015 at 22:24
Sorry about that, they're now the same length. I'd prefer to stay in NumPy as I'm just learning Python and have decided to postpone Pandas for a little while. — pir
– pir, Commented Mar 24, 2015 at 22:32

ali_m · Accepted Answer · 2015-03-24 22:59:06Z

4

A 'pure numpy' solution might use a combination of np.unique and np.bincount:

import numpy as np

pred = [0.99,  0.23,  0.11,  0.64,  0.45,  0.55, 0.76,  0.72,  0.97]
users = ['User2', 'User3', 'User2', 'User3', 'User0', 'User1', 'User4',
         'User4', 'User4']

# assign integer indices to each unique user name, and get the total
# number of occurrences for each name
unames, idx, counts = np.unique(users, return_inverse=True, return_counts=True)

# now sum the values of pred corresponding to each index value
sum_pred = np.bincount(idx, weights=pred)

# finally, divide by the number of occurrences for each user name
mean_pred = sum_pred / counts

print(unames)
# ['User0' 'User1' 'User2' 'User3' 'User4']

print(mean_pred)
# [ 0.45        0.55        0.55        0.435       0.81666667]

If you have pandas installed, DataFrames have some very nice methods for grouping and summarizing data:

import pandas as pd

df = pd.DataFrame({'name':users, 'pred':pred})

print(df.groupby('name').mean())
#            pred
# name           
# User0  0.450000
# User1  0.550000
# User2  0.550000
# User3  0.435000
# User4  0.816667

answered Mar 24, 2015 at 22:59

ali_m

74.6k28 gold badges230 silver badges315 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ali_m Over a year ago

I don't really understand what you mean by "unique label for each user" - in your example it seems that User2 would have corresponding label values of both 0 and 1. Also, on SO you should post follow-up questions separately (you can include a link to the original question in order to provide context).

pir Over a year ago

Okay, I'll do that. Thanks.

pir · Accepted Answer · 2015-03-25 12:38:43Z

1

If you want to stick to numpy, the simplest is to use np.unique and np.bincount:

>>> pred = np.array([0.99, 0.23, 0.11, 0.64, 0.45, 0.55, 0.76, 0.72, 0.97])
>>> users = np.array(['User2', 'User3', 'User2', 'User3', 'User0', 'User1',
...                   'User4', 'User4', 'User4'])
>>> unq, idx, cnt = np.unique(users, return_inverse=True, return_counts=True)
>>> avg = np.bincount(idx, weights=pred) / cnt
>>> unq
array(['User0', 'User1', 'User2', 'User3', 'User4'],
      dtype='|S5')
>>> avg
array([ 0.45      ,  0.55      ,  0.55      ,  0.435     ,  0.81666667])

edited Mar 25, 2015 at 12:38

pir

6,05316 gold badges70 silver badges112 bronze badges

answered Mar 24, 2015 at 22:54

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Comments

Eelco Hoogendoorn · Accepted Answer · 2016-04-02 20:39:05Z

1

A compact solution is to use numpy_indexed (disclaimed: I am its author), which implements a solution similar to the vectorized one proposed by Jaime; but with a cleaner interface and more tests:

import numpy_indexed as npi
npi.group_by(users).mean(pred)

edited Apr 2, 2016 at 20:39

answered Apr 2, 2016 at 13:30

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

Average using grouping value in another vector (numpy / Python)

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related