Faster numpy array indexing when using condition (numpy.where)?

Question

I have a huge numpy array with shape (50000000, 3) and I'm using:

x = array[np.where((array[:,0] == value) | (array[:,1] == value))]

to get the part of the array that I want. But this way seems to be quite slow. Is there a more efficient way of performing the same task with numpy?

array[ (array[:,:2]==value).any(axis=1) ] might be a bit faster — hpaulj
– hpaulj, Commented Nov 17, 2022 at 23:25
its not in my case but thanks for the answer, maybe there is no better way — Joseph K
– Joseph K, Commented Nov 18, 2022 at 0:12
That's a straight forward use of numpy methods and indexing. Why should there be something faster? - unless you want to try your hand as numba/cython. — hpaulj
– hpaulj, Commented Nov 18, 2022 at 1:06

Jérôme Richard · Accepted Answer · 2022-11-18 12:03:29Z

3

np.where is highly optimized and I doubt someone can write a faster code than the one implemented in the last Numpy version (disclaimer: I was one who optimized it). That being said, the main issue here is not much np.where but the conditional which create a temporary boolean array. This is unfortunately the way to do that in Numpy and there is not much to do as long as you use only Numpy with the same input layout.

One reason explaining why it is not very efficient is that the input data layout is inefficient. Indeed, assuming array is contiguously stored in memory using the default row major ordering, array[:,0] == value will read 1 item every 3 item of the array in memory. Due to the way CPU cache works (ie. cache lines, prefetching, etc.), 2/3 of the memory bandwidth is wasted. In fact, the output boolean array also need to be written and filling a newly-created array is a bit slow due to page faults. Note that array[:,1] == value will certainly reload data from RAM due to the size of the input (that cannot fit in most CPU caches). The RAM is slow and it is getter slower compared to the computational speed of the CPU and caches. This problem, called "memory wall", has been observed few decades ago and it is not expected to be fixed any time soon. Also note that the logical-or will also create a new array read/written from/to RAM. A better data layout is a (3, 50000000) transposed array contiguous in memory (note that np.transpose does not produce a contiguous array).

Another reason explaining the performance issue is that Numpy tends not to be optimized to operate on very small axis.

One main solution is to create the input in a transposed way if possible. Another solution is to write a Numba or Cython code. Here is an implementation of the non transposed input:

# Compilation for the most frequent types. 
# Please pick the right ones so to speed up the compilation time. 
@nb.njit(['(uint8[:,::1],uint8)', '(int32[:,::1],int32)', '(int64[:,::1],int64)', '(float64[:,::1],float64)'], parallel=True)
def select(array, value):
    n = array.shape[0]
    mask = np.empty(n, dtype=np.bool_)
    for i in nb.prange(n):
        mask[i] = array[i, 0] == value or array[i, 1] == value
    return mask

x = array[select(array, value)]

Note that I used a parallel implementation since the or operator is sub-optimal with Numba (the only solution seems to use a native code or Cython) and also because the RAM cannot be fully saturated with one thread on some platforms like computing servers. Also note that it can be faster to use array[np.where(select(array, value))[0]] regarding the result of select. Indeed, if the result is random or very small, then np.where can be faster since it has special optimizations for theses cases that a boolean indexing does not perform. Note that np.where is not particularly optimized in the context of a Numba function since Numba use its own implementation of Numpy functions and they are sometimes not as much optimized for large arrays. A faster implementation consists in creating x in parallel but this is not trivial to do with Numba since the number of output item is not known ahead of time and that threads must know where to write data, not to mention Numpy is already fairly fast to do that in sequential as long as the output is predictable.

answered Nov 18, 2022 at 12:03

Jérôme Richard

53.3k6 gold badges48 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Joseph K Over a year ago

thank you it is indeed faster when I use array[np.where(select(array, value))[0]] I will try to transpose the input also

Joseph K Over a year ago

nice numba code easy to understand, my only question is since I'm not familiar with the package what am I supposed to use as the first parameter of the njit function, my array values are int64, maybe I can change them

Jérôme Richard Over a year ago

If your values are int64 ones, then you can replace the list with just '(int64[:,::1],int64)'. The first parameter is called the signature of the function (this is a string, a Numba function type or a list of multiple signatures). By the way, note that the bigger the integer type the slower the operation tends to be since the input array must be read from the RAM. For example, the optimal time to read the input with a 20 GiB RAM is 56 ms but only 28 ms with int32 items.

Collectives™ on Stack Overflow

Faster numpy array indexing when using condition (numpy.where)?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related