find numpy array in other numpy array

Question

I need to find a small numpy array in a much larger numpy array. For example:

import numpy as np
a = np.array([1, 1])
b = np.array([2, 3, 3, 1, 1, 1, 8, 3, 1, 6, 0, 1, 1, 3, 4])

A function

find_numpy_array_in_other_numpy_array(a, b)

should return indices

[3, 4, 11]

that represent where the complete numpy array a appears in the complete numpy array b.

There is a brute force approach to this problem that is slow when dealing with very large b arrays:

ok = []
for idx in range(b.size - a.size + 1):
    if np.all(a == b[idx : idx + a.size]):
        ok.append(idx)

I am looking for a much faster way to find all indices of the full array a in array b. The fast approach should also allow other comparison functions, e.g. to find the worst case difference between a and b:

diffs = []
for idx in range(b.size - a.size + 1):
    bi = b[idx : idx + a.size]
    diff = np.nanmax(np.abs(bi - a))
    diffs.append(diff)

Community · Accepted Answer · 2020-06-20 09:12:55Z

4

Generic solution setup

For a generic solution, we can create 2D array of sliding windows and then perform the relevant operations -

from skimage.util.shape import view_as_windows

b2D = view_as_windows(b,len(a))

NumPy equivalent implementation.

Problem #1

Then, to solve for matching indices problem, it's simply -

matching_indices = np.flatnonzero((b2D==a).all(axis=1))

Problem #2

To solve for the second problem, it maps easily by keeping in mind that any ufunc reduction operation to get an output element is to be translated into reduction along the second axis in the proposed solution using that ufunc's axis argument -

diffs = np.nanmax(np.abs(b2D-a),axis=1)

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Dec 13, 2018 at 17:01

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Russell Burdt Over a year ago

Implementation with scikit-image is fantastic! A 5 minute code execution step was reduced to 5 seconds.

Brenlla · Accepted Answer · 2018-12-13 17:20:48Z

1

The following code finds all matches of 1st element in your sequence (a) in array b. Then it creates a new array with columns of your possible sequence candidates, compares them to the full sequence, and filters the initial indexes

seq, arr = a, b
len_seq = len(seq)    
ini_idx = (arr[:-len_seq+1]==seq[0]).nonzero()[0] # idx of possible sequence canditates   
seq_candidates = arr[np.arange(1, len_seq)[:, None]+ini_idx] # columns with possible seq. candidates   
mask = (seq_candidates==seq[1:,None]).all(axis=0)
idx = ini_idx[mask]

edited Dec 13, 2018 at 17:20

answered Dec 13, 2018 at 17:03

Brenlla

1,4811 gold badge11 silver badges24 bronze badges

Comments

javidcf · Accepted Answer · 2018-12-13 17:55:39Z

You can consider using Numba to compile the function. You could do it like this:

import numpy as np
import numba as nb

@nb.njit(parallel=True)
def search_in_array(a, b):
    idx = np.empty(len(b) - len(a) + 1, dtype=np.bool_)
    for i in nb.prange(len(idx)):
        idx[i] = np.all(a == b[i:i + len(a)])
    return np.where(idx)[0]

a = np.array([1, 1])
b = np.array([2, 3, 3, 1, 1, 1, 8, 3, 1, 6, 0, 1, 1, 3, 4])
print(search_in_array(a, b))
# [ 3  4 11]

A quick benchmark:

import numpy as np

np.random.seed(100)
a = np.random.randint(5, size=10)
b = np.random.randint(5, size=10_000_000)

# Non-compiled function
%timeit search_in_array.py_func(a, b)
# 22.8 s ± 242 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Compiled function
%timeit search_in_array(a, b)
# 54.7 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As you see, you can get a ~400x speedup and the memory cost is relatively low (a boolean array the same size as the big array).

Collectives™ on Stack Overflow

find numpy array in other numpy array

3 Answers 3

Generic solution setup

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Generic solution setup

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related