numpy array mapping and take average

Question

I have three arrays

import numpy as np
value = np.array ([1, 3, 3, 5, 5, 7, 3])
index = np.array ([1, 1, 3, 3, 6, 6, 6])
data  = np.array ([1, 2, 3, 4, 5, 6])

Arrays "index" & "value" have same size and I want to group the items in "value" by taking average. For example: For the first two items [1, 3, ... in "value", have the same key 1 in "index", so for the final array the value is the mean of the 1st & 2rd items in value : (1 + 3 )/2 which is equal 2

The final array is:

[2, nan, 4, nan, nan, 5]

first value is the average of 1st and 2nd of "value"
second value is nan because there is not any key in "index" (no "2" in array index)
third value is the average of 3rd and 4th of "value" ...

Thanks for your help!!!

Regards, Roy

"[...]because there is not any key in index" - can you explain how the indices in the index array relate to the average values any better? — Jim Brissom
– Jim Brissom, Commented Jan 13, 2011 at 2:00
Oh sorry may be my explain no clear Arrays "index" & "value" have same size and I want to group the items in "value" by taking average For example: For the first two items [1, 3, ... in value have the same key 1 in "index", so for the final array the value is the mean of the 1st & 2rd items in value : (1 + 3 )/2 which is equal 2 — Roy
– Roy, Commented Jan 13, 2011 at 2:08
Just edit your original posting. Comments are not really made for that. — Jim Brissom
– Jim Brissom, Commented Jan 13, 2011 at 2:13

Steve Tjoa · Accepted Answer · 2011-01-13 07:40:16Z

3

>>> [value[index==i].mean() for i in data]
[2.0, nan, 4.0, nan, nan, 5.0]

answered Jan 13, 2011 at 7:40

Steve Tjoa

61.5k18 gold badges93 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sven Marnach · Accepted Answer · 2011-01-13 10:21:33Z

3

Maybe you would like to use numpy.bincount()?

value = np.array([1, 3, 3, 5, 5, 7, 3])
index = np.array([1, 1, 3, 3, 6, 6, 6])
np.bincount(index, value) / np.bincount(index)
# array([ NaN,   2.,  NaN,   4.,  NaN,  NaN,   5.])

answered Jan 13, 2011 at 10:21

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

Comments

Paul · Accepted Answer · 2011-01-13 02:23:39Z

0

Is this the general idea you are looking for?

import numpy as np
value = np.array ([1, 3, 3, 5, 5, 7, 3])
index = np.array ([1, 1, 3, 3, 6, 6, 6])
data  = np.array ([1, 2, 3, 4, 5, 6])

answer = np.array(data, dtype=float)
for i, e in enumerate(data):
    idx = np.where(index==e)[0]
    val = value[idx]
    answer[i] = np.mean(val)

print answer # [  2.  nan   4.  nan  nan   5.]

If your data array is very large, there may be better solutions.

answered Jan 13, 2011 at 2:23

Paul

43.9k17 gold badges112 silver badges126 bronze badges

6 Comments

Roy Over a year ago

yes my data is actually very large :P, around 4320000 records. Sorry for unclear ask.

Paul Over a year ago

how big is value and index then?

Paul Over a year ago

is a len(value) by len(data) 2D array too big to fit in memory?

Roy Over a year ago

For "value" and "index" size is 4320000 , for "data" is smaller, 1124000 , the memory is not enough to make that huge array

Paul Over a year ago

Then I think I'd stick with the above solution. You could use an array mask instead of where to try to optimize, but I think you are stuck iterating with python. If it is still to slow, you can try cython.

|

Roy · Accepted Answer · 2011-01-13 09:52:04Z

0

I have searched for use numpy histogram to solve the huge array:

value = np.array ([1, 3, 3, 5, 5, 7, 3], dtype='float')
index = np.array ([1, 1, 3, 3, 6, 6, 6], dtype='float')
data = np.array ([1, 2, 3, 4, 5, 6])

sums = np.histogram(index , bins=np.arange(index.min(), index.max()+2), weights=value)[0]
counter = np.histogram(index , bins=np.arange(index.min(), index.max()+2))[0]

sums / counter

array([ 2., NaN, 4., NaN, NaN, 5.])

answered Jan 13, 2011 at 9:52

Roy

3071 gold badge2 silver badges9 bronze badges

Collectives™ on Stack Overflow

numpy array mapping and take average

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related