Averaging values in array corresponding to the values of another array

Question

I have an array that contains numbers that are distances, and another that represents certain values at that distance. How do I calculate the average of all the data at a fixed value of the distance?

e.g distances (d): [1 1 14 6 1 12 14 6 6 7 4 3 7 9 1 3 3 6 5 8]

e.g data corresponding to the entry of the distances:

therefore value=3.3 at d=1; value=2,1 at d=1; value=3.5 at d=14; etc..

[3.3 2.1 3.5 2.5 4.6 7.4 2.6 7.8 9.2 10.11 14.3 2.5 6.7 3.4 7.5 8.5 9.7 4.3 2.8 4.1]

For example, at distance d=6 I should do the mean of 2.5, 7.8, 9.2 and 4.3

I've used the following code that works, but I do not know how to store the values into a new array:

from numpy import mean

for d in set(key): 
    print d, mean([dist[i] for i in range(len(key)) if key[i] == d])

Please help! Thanks

Steve Archer · Accepted Answer · 2018-12-07 21:15:09Z

1

You've got the hard part done, just putting your results into a new list is as easy as:

result = []
for d in set(key): 
    result.append(mean([dist[i] for i in range(len(key)) if key[i] == d]))

answered Dec 7, 2018 at 21:15

Steve Archer

6514 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rafaelc · Accepted Answer · 2018-12-07 21:46:21Z

1

Using pandas

g = pd.DataFrame({'d':d, 'k':k}).groupby('d')

Option 1: transform to get the values in the same positions

g.transform('mean').values

Option2: mean directly and get a dict with the mapping

g.mean().to_dict()['k']

answered Dec 7, 2018 at 21:46

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Comments

user3483203 · Accepted Answer · 2018-12-07 21:39:36Z

0

Setup

d = np.array(
  [1, 1, 14, 6, 1, 12, 14, 6, 6, 7, 4, 3, 7, 9, 1, 3, 3, 6, 5, 8]
)

k = np.array(
  [3.3,2.1,3.5,2.5,4.6,7.4,2.6,7.8,9.2,10.11,14.3,2.5,6.7,3.4,7.5,8.5,9.7,4.3,2.8,4.1]
)

`scipy.sparse` + `csr_matrix`

from scipy import sparse

s = d.shape[0]
r = np.arange(s+1)
m = d.max() + 1
b = np.bincount(d)

out = sparse.csr_matrix( (k, d, r), (s, m) ).sum(0).A1

(out / b)[d]

array([ 4.375,  4.375,  3.05 ,  5.95 ,  4.375,  7.4  ,  3.05 ,  5.95 ,
        5.95 ,  8.405, 14.3  ,  6.9  ,  8.405,  3.4  ,  4.375,  6.9  ,
        6.9  ,  5.95 ,  2.8  ,  4.1  ])

edited Dec 7, 2018 at 21:39

answered Dec 7, 2018 at 21:29

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Comments

cavalcantelucas · Accepted Answer · 2018-12-07 22:03:24Z

0

You could use array from the numpy lib in combination with where, also from the same lib.

You can define a function to get the positions of the desired distances:

from numpy import mean, array, where  

def key_distances(distances, d):
  return where(distances == d)[0]

then you use it for getting the values at those positions.

Let's say you have:

d = array([1,1,14,6,1,12,14,6,6,7,4,3,7,9,1,3,3,6,5,8])
v = array([3.3,2.1,3.5,2.5,4.6,7.4,2.6,7.8,9.2,10.11,14.3,2.5,6.7,3.4,7.5,8.5,9.7,4.3,2.8,4.1])

Then you might do something like:

vs = v[key_distances(d,d[1])]

Then get your mean:

print mean(vs)

answered Dec 7, 2018 at 22:03

cavalcantelucas

1,3943 gold badges13 silver badges35 bronze badges

Comments

Eelco Hoogendoorn · Accepted Answer · 2018-12-08 08:49:00Z

0

The numpy_indexed package (disclaimer: I am its author) was designed with these use-cases in mind:

import numpy_indexed as npi
npi.group_by(d).mean(dist)

Pandas can do similar things; but its api isnt really tailored to these things; and for such an elementary operation as a group-by I feel its kinda wrong to have to hoist your data into a completely new datastructure.

answered Dec 8, 2018 at 8:49

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

Averaging values in array corresponding to the values of another array

5 Answers 5

Comments

Comments

`scipy.sparse` + `csr_matrix`

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

scipy.sparse + csr_matrix

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

`scipy.sparse` + `csr_matrix`