0

Lets say I got 4 numpy arrays of shape (1000,1000):

a1, a2, a3, a4...

I would like fill a4 with different values and conditions based on the other 3 arrays.

For example (a basic example, made for the purposes of the question):

if a1 < a2:
   a4 = a3 / a2
elif a2 = a1: 
   a4 = a1: a4 = a2 + a1
else:
   if a3 < a1:
      a4 = a1 * a2 * a3
   else:
      a4 = a1 / a2 - a3

What is the proper and the most effective way to fill a4 properly? np.where seems to be invalid way, as it's hard to implement condition inside a condition... Should I use np.putmask() multiple times? won't it make it really slow? (something like below)

np.putmask(a4,(a1 < a2),a3 / a2)
np.putmask(a4,(a2 = a1),a2 + a1)
np.putmask(a4,!((a2 = a1) & (a1 < a2)) & (a3 < a1),a1 * a2 * a3)
np.putmask(a4,!((a2 = a1) & (a1 < a2)) & (a3 < a1),a1 / a2 - a3)

Is there any other way?

Basically I got a task to turn some complex math formulas with multiple cross-overing conditions based on single values to the ones based on whole arrays of these numbers.

4
  • In your example else ifis not valid python and the else: branch could be written as a second elif: ... else:. To chain multiple if elif you can use np.select. It would be easier to help with example input und expected output data. Commented Nov 9, 2020 at 13:50
  • it's a schema, not python code; hard to give real data, as these are spatial images (1000x1000px), 15+ MB each Commented Nov 9, 2020 at 13:54
  • np.select doesn't seem to help in that task, it just selects elements, I need to do math on arrays, whole ones Commented Nov 9, 2020 at 14:09
  • I'm not sure why np.select is not the choice. Something like np.select((a1<a2, a2==a1), (a3/a2, a2+a1))? Commented Nov 9, 2020 at 14:20

2 Answers 2

4

Something like np.putmask is the "proper" NumPy way to go. You can of course write these out as e.g. mask = a1 < a2; a4[mask] = (a3/a2)[mask] (if I understand your pseudo code correctly), but that's not necessarily more clear.

If you're concerned that it's slow because the arrays have to be traversed several times, that's just how it is using NumPy. A separate (and probably faster) way to go is to use Numba. Here's an example:

from time import time
import numpy as np
import numba

@numba.njit
def f(a1, a2, a3):
    a4 = np.empty(a1.shape)
    for i in range(a1.shape[0]):
        for j in range(a1.shape[1]):
            if a1[i, j] < a2[i, j]:
                a4[i, j] = a3[i, j] / a2[i, j]
            elif a2[i, j] == a1[i, j]:
                a4[i, j] = a2[i, j] + a1[i, j]
            else:
                if a3[i, j] < a1[i, j]:
                    a4[i, j] = a1[i, j] * a2[i, j] * a3[i, j]
                else:
                    a4[i, j] = a1[i, j] / a2[i, j] - a3[i, j]
    return a4

a1 = np.random.random((1000, 1000))
a2 = np.random.random((1000, 1000))
a3 = np.random.random((1000, 1000))
t0 = time()
a4 = f(a1, a2, a3)  # First call which also compiles
t1 = time()
a4 = f(a1, a2, a3)  # Fast call
t2 = time()
print('Timings:', t1 - t0, t2 - t1)

With Numba you just write everything out with indices as in the style of your pseudo code. This is slow in pure Python but fast in Numba. In the above, I've placed timings. Removing the @numba.njit line makes it about 125 times (!) slower on my machine. Note that the first call is not as fast as all others, as the first call has some compilation overhead. So this is mostly useful if you have to call this function many times.

To make it even faster, use

@numba.njit(error_model='numpy')

which deactivates some division-by-zero checks.

Sign up to request clarification or add additional context in comments.

2 Comments

10x faster than np.select. Nice!
why can't I accept both answers? :( I would accept this one, as it says "you are not that stupid, your solution is fine" and it gives fine alternative, but the second answer was pure numpy, thank you very much anyway
2

An example for 5x5 arrays with np.select. The code is the same for larger arrays.

import numpy as np
np.random.seed(10)

a1 = np.random.randint(1,3,(5,5))
a2 = np.random.randint(1,3,(5,5))
a3 = np.random.randint(1,3,(5,5))
print(a1,a2,a3, sep='\n')

Out:

[[2 2 1 2 1]
 [2 2 1 2 2]
 [1 2 2 1 1]
 [2 1 1 1 1]
 [1 2 1 1 2]]
[[2 1 1 2 1]
 [1 2 1 1 1]
 [2 2 1 2 2]
 [2 2 2 1 2]
 [1 1 1 1 2]]
[[1 2 2 2 1]
 [2 1 2 2 1]
 [2 1 1 2 1]
 [1 1 2 2 1]
 [1 1 2 1 2]]

Your conditions and computations on arrays

a4 = np.select(
    [a1 < a2, a2 == a1, a3 < a1],      # list of conditions
    [a3 / a2, a2 + a1 , a1 * a2 * a3], # list of corresponding computations
    a1 / a2 - a3                       # last 'else' computation
)
print(a4)

Out:

[[4.  0.  2.  4.  2. ]
 [0.  4.  2.  0.  2. ]
 [1.  4.  2.  1.  0.5]
 [4.  0.5 1.  2.  0.5]
 [2.  2.  2.  2.  4. ]]

Micro-Benchmark

Benchmarked np.select vs @numba.njit(error_model='numpy')(hot run) with 1000x1000 arrays on a colab instance

import numpy as np
import numba

a1 = np.random.random((1000, 1000))
a2 = np.random.random((1000, 1000))
a3 = np.random.random((1000, 1000))

First asserting equality and jit compiling

np.testing.assert_array_equal(np.select(
    [a1 < a2, a2 == a1, a3 < a1],
    [a3 / a2, a2 + a1, a1 * a2 * a3],
    a1 / a2 - a3
), f(a1, a2, a3))

Results

%%timeit
f(a1, a2, a3)

Out:

100 loops, best of 3: 3.77 ms per loop
%%timeit
np.select(
    [a1 < a2, a2 == a1, a3 < a1],
    [a3 / a2, a2 + a1, a1 * a2 * a3],
    a1 / a2 - a3
)

Out:

10 loops, best of 3: 37.2 ms per loop

1 Comment

why can't I accept both answers? :( I am accepting this one, as it's numpy, but the second answer is interesting as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.