1

I have four (nx1) dimensional arrays named a, b, c, and F. I want to run this algorithm without any loops.

for i in range(n):
    if a[i] < b[i]:
        F[i] = 0
    elif a[i] > c[i]:
        F[i] = 1
    elif b[i] <= a[i] <= c[i]:
        F[i] = 2

I want to write this code in a more vectorized way to make it more efficient as my dataset is quite large.

4
  • In numpy terms, 'vectorize' means performing the iteration in compiled code. Usually that means using compiled numpy methods. But compiling your own code with numba can do just as well. Commented May 13, 2022 at 14:34
  • Thank you very much for your answer. I've tried your suggestion before, as @AboAmmar suggested. Commented May 13, 2022 at 14:40
  • 1
    You could have avoided a lot of that discussion in @NiziL answer by providing a minimal reproducible example - example arrays that you expect to work. Commented May 13, 2022 at 15:25
  • Yeah you're right 😂😂 Commented May 13, 2022 at 18:23

2 Answers 2

4

I feel like you could use boolean indexing for this task.

F[np.logical_and(b <= a, a <= c)] = 2
F[a > c] = 1
F[a < b] = 0

Beware, the affectation order is important here to get the desired outcomes.

Some timeit benchmark:

def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] > c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

def idx(F, a, b, c):
  F[np.logical_and(b <= a, a <= c)] = 2
  F[a > c] = 1
  F[a < b] = 0

with (10x1) array:

>>> timeit.timeit(lambda: loop(F, a, b, c))
11.585818066001593
>>> timeit.timeit(lambda: idx(F, a, b, c))
3.337863392000145

with (1000x1) array:

>>> timeit.timeit(lambda: loop(F, a, b, c))
1457.268110728999
>>> timeit.timeit(lambda: idx(F, a, b, c))
10.00236530300026
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you very much. As you said, the affectation order is important here to get the desired outcomes. I checked your suggestion code with various orders, and the results were different. How can I overcome this problem?
Actually, I want the code to do the calculations (based on the if-else conditions) for all members of the array F, simultaneously.
Well, the order in your if-else block is actually important too, isn't it ? a[i] < b[i] and a[i] > c[i] can both be true, so the first checked condition will have the priority. So I've basically reversed your condition order to ensure F[...] = 0 will erase other affectations, and so have the "priority"... Not sure if I'm understandable here...
The order in my if-else block doesn't make any difference in the final results. But your suggestion code results are affected by the orders.
@MiladAnboohiso Really ? if a[i] < b[i] and a[i] > c[i] are both true, the order is important. I don't get how the result can stay the same without any constraint on b and c.... maybe b is always smaller than c ? Btw, I don't think you can "overcome" this problem with my approach, just have to find the right order.
|
2

If you care about performance, why not try numba? It might get 10X faster than logical operations while saving memory at the same time. As a bonus, the loop code you wrote will be kept intact, only through an @njit decorator in front of the function.

from numba import njit

@njit
def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] < c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

Compare with vectorized solution by @NiziL using sizes of 100 and 1000 vectors,

timeit(lambda: loop(F, a, b, c))
timeit(lambda: idx(F, a, b, c))

Gives:

# 1.0355658 (Size: 100, @njit loop)
# 4.6863165 (Size: 100, idx)

# 1.9563843 (Size: 1000, @njit loop)
# 16.658198 (Size: 1000, idx)

3 Comments

Thank you very much. It works correctly. But does it the best way to vectorize this algorithm?
Yes, numba does a really a good job optimizing any python code having loops unless there is an optimized low level version, from numpy for example, that does the whole algorithm in one pass. Even if some steps of the algorithm are available as numpy functions, the numba version will still be faster because it optimizes the whole function without having to store intermediate results.
thank you very much for the useful hints. I've run a code with your instructions, and it was amazingly fast by use of numba.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.