How to vectorize such an algorithm in python?

Question

I have four (nx1) dimensional arrays named a, b, c, and F. I want to run this algorithm without any loops.

for i in range(n):
    if a[i] < b[i]:
        F[i] = 0
    elif a[i] > c[i]:
        F[i] = 1
    elif b[i] <= a[i] <= c[i]:
        F[i] = 2

I want to write this code in a more vectorized way to make it more efficient as my dataset is quite large.

In numpy terms, 'vectorize' means performing the iteration in compiled code. Usually that means using compiled numpy methods. But compiling your own code with numba can do just as well. — hpaulj
– hpaulj, Commented May 13, 2022 at 14:34
Thank you very much for your answer. I've tried your suggestion before, as @AboAmmar suggested. — Milad Anboohi
– Milad Anboohi, Commented May 13, 2022 at 14:40
You could have avoided a lot of that discussion in @NiziL answer by providing a minimal reproducible example - example arrays that you expect to work. — hpaulj
– hpaulj, Commented May 13, 2022 at 15:25

NiziL · Accepted Answer · 2022-05-13 12:59:05Z

4

I feel like you could use boolean indexing for this task.

F[np.logical_and(b <= a, a <= c)] = 2
F[a > c] = 1
F[a < b] = 0

Beware, the affectation order is important here to get the desired outcomes.

Some timeit benchmark:

def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] > c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

def idx(F, a, b, c):
  F[np.logical_and(b <= a, a <= c)] = 2
  F[a > c] = 1
  F[a < b] = 0

with (10x1) array:

>>> timeit.timeit(lambda: loop(F, a, b, c))
11.585818066001593
>>> timeit.timeit(lambda: idx(F, a, b, c))
3.337863392000145

with (1000x1) array:

>>> timeit.timeit(lambda: loop(F, a, b, c))
1457.268110728999
>>> timeit.timeit(lambda: idx(F, a, b, c))
10.00236530300026

edited May 13, 2022 at 12:59

answered May 13, 2022 at 9:01

NiziL

5,1401 gold badge26 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Milad Anboohi Over a year ago

Thank you very much. As you said, the affectation order is important here to get the desired outcomes. I checked your suggestion code with various orders, and the results were different. How can I overcome this problem?

Milad Anboohi Over a year ago

Actually, I want the code to do the calculations (based on the if-else conditions) for all members of the array F, simultaneously.

NiziL Over a year ago

Well, the order in your if-else block is actually important too, isn't it ? a[i] < b[i] and a[i] > c[i] can both be true, so the first checked condition will have the priority. So I've basically reversed your condition order to ensure F[...] = 0 will erase other affectations, and so have the "priority"... Not sure if I'm understandable here...

Milad Anboohi Over a year ago

The order in my if-else block doesn't make any difference in the final results. But your suggestion code results are affected by the orders.

NiziL Over a year ago

@MiladAnboohiso Really ? if a[i] < b[i] and a[i] > c[i] are both true, the order is important. I don't get how the result can stay the same without any constraint on b and c.... maybe b is always smaller than c ? Btw, I don't think you can "overcome" this problem with my approach, just have to find the right order.

|

AboAmmar · Accepted Answer · 2022-05-13 11:51:26Z

2

If you care about performance, why not try numba? It might get 10X faster than logical operations while saving memory at the same time. As a bonus, the loop code you wrote will be kept intact, only through an @njit decorator in front of the function.

from numba import njit

@njit
def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] < c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

Compare with vectorized solution by @NiziL using sizes of 100 and 1000 vectors,

timeit(lambda: loop(F, a, b, c))
timeit(lambda: idx(F, a, b, c))

Gives:

# 1.0355658 (Size: 100, @njit loop)
# 4.6863165 (Size: 100, idx)

# 1.9563843 (Size: 1000, @njit loop)
# 16.658198 (Size: 1000, idx)

answered May 13, 2022 at 11:51

AboAmmar

5,5892 gold badges15 silver badges25 bronze badges

3 Comments

Milad Anboohi Over a year ago

Thank you very much. It works correctly. But does it the best way to vectorize this algorithm?

AboAmmar Over a year ago

Yes, numba does a really a good job optimizing any python code having loops unless there is an optimized low level version, from numpy for example, that does the whole algorithm in one pass. Even if some steps of the algorithm are available as numpy functions, the numba version will still be faster because it optimizes the whole function without having to store intermediate results.

Milad Anboohi Over a year ago

thank you very much for the useful hints. I've run a code with your instructions, and it was amazingly fast by use of numba.

Collectives™ on Stack Overflow

How to vectorize such an algorithm in python?

2 Answers 2

7 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related