3

I need to average the Y values corresponding to the values in the X array...

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3 ... ])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ... ])

In other words, the equivalents of the 1 values in the X array are 10 and 30 in the Y array, and the average of this is 20, the equivalents of the 2 values are 15, 10, 16, and 10, and their average is 12.75, and so on...

How can I calculate these average values?

1
  • 1
    np.bincount(X-1, Y) / np.bincount(X-1), if the groups are ascending starting from 1 Commented Jun 14, 2022 at 19:33

5 Answers 5

5

One option is to use a property of linear regression (with categorical variables):

import numpy as np

x = np.array([  1,  1,  2,  2,  2,  2,  3,  3 ])
y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ])

x_dummies = x[:, None] == np.unique(x)
means = np.linalg.lstsq(x_dummies, y, rcond=None)[0]
print(means) # [20.   12.75 17.5 ]
Sign up to request clarification or add additional context in comments.

Comments

4

You can try using pandas

import pandas as pd
import numpy as np

N = pd.DataFrame(np.transpose([X,Y]),
             columns=['X', 'Y']).groupby('X')['Y'].mean().to_numpy()
# array([20.  , 12.75, 17.5 ])

2 Comments

why so complicated? pd.Series(Y).groupby(X).mean().to_numpy() ;)
That makes a lot more sense. I always forget that your can groupby an array.
2
import numpy as np

X = np.array([  1,  1,  2,  2,  2,  2,  3,  3])

Y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20])

# Only unique values
unique_vals = np.unique(X);

# Loop for every value
for val in unique_vals:
    # Search for proper indexes in Y
    idx = np.where(X == val)
    # Mean for finded indexes
    aver = np.mean(Y[idx])
    print(f"Average for {val}: {aver}")

Result:

Average for 1: 20.0

Average for 2: 12.75

Average for 3: 17.5

Comments

1

you can use something like the below code :

import numpy as np

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20])


def groupby(a, b):
    # Get argsort indices, to be used to sort a and b in the next steps
    sidx = b.argsort(kind='mergesort')
    a_sorted = a[sidx]
    b_sorted = b[sidx]

    # Get the group limit indices (start, stop of groups)
    cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])

    # Split input array with those start, stop ones
    out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
    return out

group_by_array=groupby(Y,X)
for item in group_by_array:
    print(np.average(item))

I use the information in the below link to answer the question: Group numpy into multiple sub-arrays using an array of values

Comments

1

I think this solution should work:

avg_arr = []
i = 1
while i <= np.max(x):
    inds = np.where(x == i)
    my_val = np.average(y[inds[0][0]:inds[0][-1]])
    avg_arr.append(my_val)
    i+=1

Definitely, not the cleanest, but I was able to test it quickly and it does indeed work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.