Finding Patterns in a Numpy Array

Question

I am trying to find patterns in a numpy array, called values. I'd like to return the starting index position of the pattern. I know I could iterative over each element and check whether that element and the next one match the pattern, but over a large dataset that is incredibly inefficient and am looking for a better alternative.

I've got a working solution using np.where for searching for a single value, but I can't get it to work with finding a pattern or two numbers.

Example:

import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
print  np.where(values == searchval)[0]

Output:

[]

Expected Output:

[1, 3, 8]

Maybe Rolling Hashes can help. (en.wikipedia.org/wiki/Rolling_hash) — betontalpfa
– betontalpfa, Commented Feb 27, 2017 at 17:33

Ed Smith · Accepted Answer · 2017-02-27 18:06:00Z

7

Couldn't you simply use np.where (assuming this is the optimal way to find an element) and then only check pattens which satisfy the first condition.

import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
N = len(searchval)
possibles = np.where(values == searchval[0])[0]

solns = []
for p in possibles:
    check = values[p:p+N]
    if np.all(check == searchval):
        solns.append(p)

print(solns)

edited Feb 27, 2017 at 18:06

answered Feb 27, 2017 at 17:45

Ed Smith

13.3k2 gold badges48 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

betontalpfa Over a year ago

If the input is random (there are a few repeat in values) this solution will be fast

Albacete Jul 1 at 15:26

This fails if the searchval[0] is closer to the end of the array than the length of searchval, e.g.: values = np.array([0,1,2,1,2,4,5,6,1,2,1,3]) with searchval = [1,2,3], because it results in different lengths for check and searchval. This can be solved with an extra check of the lengths or a try/except block.

hpaulj · Accepted Answer · 2017-02-27 19:48:46Z

Here's a straight forward approach to using where. Start with a logical expression that finds the matches:

In [670]: values = np.array([0,1,2,1,2,4,5,6,1,2,1])
     ...: searchval = [1,2]
     ...: 
In [671]: (values[:-1]==searchval[0]) & (values[1:]==searchval[1])
Out[671]: array([False,  True, False,  True, False, False, False, False,  True, False], dtype=bool)
In [672]: np.where(_)
Out[672]: (array([1, 3, 8], dtype=int32),)

That could be generalized into a loop that operates on multiple searchval. Getting the slice range correct will take some fiddling. The roll suggested in another answer might be easier, but I suspect a bit slower.

As long as searchval is small compared to values this general approach should be efficient. There is a np.in1d that does this sort of match, but with a or test. So it isn't applicable. But it too uses this iterative approach is the searchval list is small enough.

Generalized slicing

In [716]: values
Out[716]: array([0, 1, 2, 1, 2, 4, 5, 6, 1, 2, 1])
In [717]: searchvals=[1,2,1]
In [718]: idx = [np.s_[i:m-n+1+i] for i in range(n)]
In [719]: idx
Out[719]: [slice(0, 9, None), slice(1, 10, None), slice(2, 11, None)]
In [720]: [values[idx[i]] == searchvals[i] for i in range(n)]
Out[720]: 
[array([False,  True, False,  True, False, False, False, False,  True], dtype=bool),
 array([False,  True, False,  True, False, False, False, False,  True], dtype=bool),
 array([False,  True, False, False, False, False,  True, False,  True], dtype=bool)]
In [721]: np.all(_, axis=0)
Out[721]: array([False,  True, False, False, False, False, False, False,  True], dtype=bool)
In [722]: np.where(_)
Out[722]: (array([1, 8], dtype=int32),)

I used the intermediate np.s_ to look at the slices and make sure they look reasonable.

as_strided

An advanced trick would be to use as_strided to construct the 'rolled' array and perform a 2d == test on that. as_strided is neat but tricky. To use it correctly you have to understand strides, and get the shape correct.

In [740]: m,n = len(values), len(searchvals)
In [741]: values.shape
Out[741]: (11,)
In [742]: values.strides
Out[742]: (4,)
In [743]: 
In [743]: M = as_strided(values, shape=(n,m-n+1),strides=(4,4))
In [744]: M
Out[744]: 
array([[0, 1, 2, 1, 2, 4, 5, 6, 1],
       [1, 2, 1, 2, 4, 5, 6, 1, 2],
       [2, 1, 2, 4, 5, 6, 1, 2, 1]])
In [745]: M == np.array(searchvals)[:,None]
Out[745]: 
array([[False,  True, False,  True, False, False, False, False,  True],
       [False,  True, False,  True, False, False, False, False,  True],
       [False,  True, False, False, False, False,  True, False,  True]], dtype=bool)
In [746]: np.where(np.all(_,axis=0))
Out[746]: (array([1, 8], dtype=int32),)

Good idea on slicing, should be pretty efficient for decent sized patterns.

jnsod · Accepted Answer · 2017-02-27 17:42:04Z

3

I think this does the job:

np.where((values == 1) & (np.roll(values,-1) == 2))[0]

answered Feb 27, 2017 at 17:42

jnsod

665 bronze badges

1 Comment

jnsod Over a year ago

Since we have a great collection of soutions I was currious about the run times and found the hpaulij solutions are around 2 times faster on pretty long random arrays (e.g. 1 mil entries) compared to the simple roll. The solution by Ed Smith is about 100 times slower and the one by betontalpfa another 100 times. Number of entries and hits changes the numbers quite a bit but not the overall ranking.

betontalpfa · Accepted Answer · 2017-02-27 18:09:54Z

If the input is random Ed Smith solution is faster. But if you has a few set of available values this hash-solution can help:

"""
Can be replaced with any revertable hash
"""
def my_hash(rem, h, add):
    return rem^h^add

"""
Imput
"""
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]


"""
Prepare
"""
sh = 0
vh = 0
ls = len(searchval)
lv = len(values)

for i in range(0, len(searchval)):
    vh = my_hash(0, vh, values[i])
    sh = my_hash(0, sh, searchval[i])

"""
Find matches
"""
for i in range(0, lv-ls):
    if sh == vh:
        eq = True
        for j in range(0, ls):
            if values[i+j] != searchval[j]:
                eq = False
                break
        if eq:
            print i
    vh = my_hash(values[i], vh, values[i+ls])

sherdim · Accepted Answer · 2022-07-07 12:50:09Z

2

Compact straitforward solution will be a "legal" variant of as_strided solution. Others have mentioned np.roll. But here is an universal solution with the only circle (132 µs).

seq = np.array([0,1,2,1,2,4,5,6,1,2,1])
patt = np.array([1,2])

Seq = np.vstack([np.roll(seq, shift) for shift in -np.arange(len(patt))]).T
where(all(Seq == patt, axis=1))[0]

The another option for sequences with small integers will be converting to string. It is faster per near 6 times (20 µs). For small positive integers only!

import re

def to_string(arr):
    return ''.join(map(chr, arr))

array([m.start() for m in re.finditer(to_string(patt), to_string(seq))])

edited Jul 7, 2022 at 12:50

answered Apr 2, 2019 at 13:43

sherdim

1,2869 silver badges20 bronze badges

Comments

kho · Accepted Answer · 2024-09-12 14:34:14Z

Another approach is to use scipy.ndimage.generic_filter to build a pattern filter, which also supports multidimensional patterns.

import numpy as np
import scipy.signal, scipy.ndimage

def pattern_filter(input, pattern):
    input = np.array(input)
    pattern = np.array(pattern)
    
    def _pattern_filter_(_input_, _pattern_):
        return np.all(_input_ == _pattern_.flatten())
    
    return scipy.ndimage.generic_filter(
        input, 
        _pattern_filter_, 
        size=pattern.shape,
        origin=-(np.array(pattern.shape)//2), 
        extra_arguments=(pattern,)).astype(bool)
    
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
pattern = [1,2]
overlap = pattern_filter(values, pattern)
np.argwhere(overlap)
# array([[1], [3], [8]])

Multidimensional example:

values = np.array([
    [0,1,2,1,2,4,5,6,1,2,1],
    [0,2,1,2,1,4,5,6,2,1,1],
    [0,0,2,1,0,0,0,0,0,0,0],
])
pattern = [[1,2], 
           [2,1]]
overlap = pattern_filter(values, pattern)
np.argwhere(overlap) 
# array([[0, 1], [0, 3], [0, 8], [1, 2]])

Collectives™ on Stack Overflow

Finding Patterns in a Numpy Array

6 Answers 6

2 Comments

Generalized slicing

as_strided

2 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

Generalized slicing

as_strided

2 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related