2

I have two arrays of the same length, the first one is a boolean array, the second one contains the corresponding values.

flag   = [0,0,0,1,1,0,0,0,1,1,1,1,0,1,1]
values = [1,5,6,8,5,6,2,0,1,9,3,8,3,6,2]

I want to return an array of medians containing the median values corresponding to each portions of 1 in the boolean matrix.

e.g.

flag   = [0,0,0,1,  1,  0,0,0, 1,  1,  1,  1, 0,1,1]
result = [0,0,0,6.5,6.5,0,0,0,5.5,5.5,5.5,5.5,0,4,4]

My unesthetic approach is to do:

result = np.zeros(values.shape[0])
vect = []
idx = []
for n in np.arange(result.size):
    if flag[n] > 0:
        vect.append(values[n])
        idx.append(n)
    elif flag[n] == 0:
        result[idx] = np.median(vect)
        vect = []
        idx = []
    result[idx] = np.median(vect)

It works well but it's not very pythonic and very slow since I work with very big arrays.

1 Answer 1

2

We can use np.diff to find transitions between 0 and 1. Then loop over pairs of 0/1 and 1/0 transitions and take the median from all values inbetween.

The resulting loop iterates over each group of ones.

flag   = [0,0,0,1,1,0,0,0,1,1,1,1,0,1,1]
values = [1,5,6,8,5,6,2,0,1,9,3,8,3,6,2]

d = np.diff(np.concatenate([[0], flag, [0]]))  # Add and append a 0 so the procedure also works if flags start or end with 1.

begin = np.flatnonzero(d==1)
end = np.flatnonzero(d==-1)

result = np.zeros_like(values, dtype=float)

for a, b in zip(begin, end):
    result[a:b] = np.median(values[a:b])

print(result)
# [ 0.   0.   0.   6.5  6.5  0.   0.   0.   5.5  5.5  5.5  5.5  0.   4.   4. ]
Sign up to request clarification or add additional context in comments.

2 Comments

Seems like the obvious one for this problem.
Thanks ! It's about 100time faster for 10 000 elements arrays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.