A proper way of filling numpy array based on multiple conditions

Question

Lets say I got 4 numpy arrays of shape (1000,1000):

a1, a2, a3, a4...

I would like fill a4 with different values and conditions based on the other 3 arrays.

For example (a basic example, made for the purposes of the question):

if a1 < a2:
   a4 = a3 / a2
elif a2 = a1: 
   a4 = a1: a4 = a2 + a1
else:
   if a3 < a1:
      a4 = a1 * a2 * a3
   else:
      a4 = a1 / a2 - a3

What is the proper and the most effective way to fill a4 properly? np.where seems to be invalid way, as it's hard to implement condition inside a condition... Should I use np.putmask() multiple times? won't it make it really slow? (something like below)

np.putmask(a4,(a1 < a2),a3 / a2)
np.putmask(a4,(a2 = a1),a2 + a1)
np.putmask(a4,!((a2 = a1) & (a1 < a2)) & (a3 < a1),a1 * a2 * a3)
np.putmask(a4,!((a2 = a1) & (a1 < a2)) & (a3 < a1),a1 / a2 - a3)

Is there any other way?

Basically I got a task to turn some complex math formulas with multiple cross-overing conditions based on single values to the ones based on whole arrays of these numbers.

In your example else ifis not valid python and the else: branch could be written as a second elif: ... else:. To chain multiple if elif you can use np.select. It would be easier to help with example input und expected output data. — Michael Szczesny
– Michael Szczesny, Commented Nov 9, 2020 at 13:50
it's a schema, not python code; hard to give real data, as these are spatial images (1000x1000px), 15+ MB each — Flash Thunder
– Flash Thunder, Commented Nov 9, 2020 at 13:54
np.select doesn't seem to help in that task, it just selects elements, I need to do math on arrays, whole ones — Flash Thunder
– Flash Thunder, Commented Nov 9, 2020 at 14:09
I'm not sure why np.select is not the choice. Something like np.select((a1<a2, a2==a1), (a3/a2, a2+a1))? — Quang Hoang
– Quang Hoang, Commented Nov 9, 2020 at 14:20

jmd_dk · Accepted Answer · 2023-05-29 15:09:44Z

4

Something like np.putmask is the "proper" NumPy way to go. You can of course write these out as e.g. mask = a1 < a2; a4[mask] = (a3/a2)[mask] (if I understand your pseudo code correctly), but that's not necessarily more clear.

If you're concerned that it's slow because the arrays have to be traversed several times, that's just how it is using NumPy. A separate (and probably faster) way to go is to use Numba. Here's an example:

from time import time
import numpy as np
import numba

@numba.njit
def f(a1, a2, a3):
    a4 = np.empty(a1.shape)
    for i in range(a1.shape[0]):
        for j in range(a1.shape[1]):
            if a1[i, j] < a2[i, j]:
                a4[i, j] = a3[i, j] / a2[i, j]
            elif a2[i, j] == a1[i, j]:
                a4[i, j] = a2[i, j] + a1[i, j]
            else:
                if a3[i, j] < a1[i, j]:
                    a4[i, j] = a1[i, j] * a2[i, j] * a3[i, j]
                else:
                    a4[i, j] = a1[i, j] / a2[i, j] - a3[i, j]
    return a4

a1 = np.random.random((1000, 1000))
a2 = np.random.random((1000, 1000))
a3 = np.random.random((1000, 1000))
t0 = time()
a4 = f(a1, a2, a3)  # First call which also compiles
t1 = time()
a4 = f(a1, a2, a3)  # Fast call
t2 = time()
print('Timings:', t1 - t0, t2 - t1)

With Numba you just write everything out with indices as in the style of your pseudo code. This is slow in pure Python but fast in Numba. In the above, I've placed timings. Removing the @numba.njit line makes it about 125 times (!) slower on my machine. Note that the first call is not as fast as all others, as the first call has some compilation overhead. So this is mostly useful if you have to call this function many times.

To make it even faster, use

@numba.njit(error_model='numpy')

which deactivates some division-by-zero checks.

edited May 29, 2023 at 15:09

answered Nov 9, 2020 at 14:25

jmd_dk

13.2k11 gold badges71 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Michael Szczesny Over a year ago

10x faster than np.select. Nice!

Flash Thunder Over a year ago

why can't I accept both answers? :( I would accept this one, as it says "you are not that stupid, your solution is fine" and it gives fine alternative, but the second answer was pure numpy, thank you very much anyway

Michael Szczesny · Accepted Answer · 2020-11-09 14:58:58Z

An example for 5x5 arrays with np.select. The code is the same for larger arrays.

import numpy as np
np.random.seed(10)

a1 = np.random.randint(1,3,(5,5))
a2 = np.random.randint(1,3,(5,5))
a3 = np.random.randint(1,3,(5,5))
print(a1,a2,a3, sep='\n')

Out:

[[2 2 1 2 1]
 [2 2 1 2 2]
 [1 2 2 1 1]
 [2 1 1 1 1]
 [1 2 1 1 2]]
[[2 1 1 2 1]
 [1 2 1 1 1]
 [2 2 1 2 2]
 [2 2 2 1 2]
 [1 1 1 1 2]]
[[1 2 2 2 1]
 [2 1 2 2 1]
 [2 1 1 2 1]
 [1 1 2 2 1]
 [1 1 2 1 2]]

Your conditions and computations on arrays

a4 = np.select(
    [a1 < a2, a2 == a1, a3 < a1],      # list of conditions
    [a3 / a2, a2 + a1 , a1 * a2 * a3], # list of corresponding computations
    a1 / a2 - a3                       # last 'else' computation
)
print(a4)

Out:

[[4.  0.  2.  4.  2. ]
 [0.  4.  2.  0.  2. ]
 [1.  4.  2.  1.  0.5]
 [4.  0.5 1.  2.  0.5]
 [2.  2.  2.  2.  4. ]]

Micro-Benchmark

Benchmarked np.select vs @numba.njit(error_model='numpy')(hot run) with 1000x1000 arrays on a colab instance

import numpy as np
import numba

a1 = np.random.random((1000, 1000))
a2 = np.random.random((1000, 1000))
a3 = np.random.random((1000, 1000))

First asserting equality and jit compiling

np.testing.assert_array_equal(np.select(
    [a1 < a2, a2 == a1, a3 < a1],
    [a3 / a2, a2 + a1, a1 * a2 * a3],
    a1 / a2 - a3
), f(a1, a2, a3))

Results

%%timeit
f(a1, a2, a3)

Out:

100 loops, best of 3: 3.77 ms per loop

%%timeit
np.select(
    [a1 < a2, a2 == a1, a3 < a1],
    [a3 / a2, a2 + a1, a1 * a2 * a3],
    a1 / a2 - a3
)

Out:

10 loops, best of 3: 37.2 ms per loop

why can't I accept both answers? :( I am accepting this one, as it's numpy, but the second answer is interesting as well

Collectives™ on Stack Overflow

A proper way of filling numpy array based on multiple conditions

2 Answers 2

2 Comments

Micro-Benchmark

Results

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Micro-Benchmark

Results

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related