I have a numpy array (shape: 10x2) such as the one below:
array
index label feature
0 121 a
1 131 b
2 113 c
3 131 d
4 223 e
5 242 f
6 212 g
7 131 h
8 113 i
9 131 j
I want to be able to find the indices that match a certain sequence get the items in the feature list that correspond to the sequence,
e.g. given the sequence [131,113,131], I would find to get index 1 and 7 (the starting indices) or the list of indices that correspond to the sequence ([1,2,3] and [7,8,9]) and then finally get the features that correspond to the sequence: [b,c,d] and [h,i,j].
My current solution is below and gives me the starting indices of the sequences but it is not very generalizable to longer sequences and a bit difficult to follow
import numpy as np
v = np.array([[121,1],
[131,1],
[113,1],
[131,1],
[223,1],
[242,1],
[212,1],
[131,1],
[113,1],
[131,1]])
sequence = [131,113,131]
c = [ind for ind, x in enumerate(v[:,0]) if (ind+1 < len(v[:,0]) and ind+2 < len(v[:,0])) if (x == sequence[0] and v[:,0][ind+1] == sequence[1] and v[:,0][ind+2] == sequence[2])]
I would prefer a solution that uses only numpy as I am restricted to an old system that has some out-of-date custom packages needed for other parts of my script but would welcome seeing it in pandas or any other package. I see this as a type of template matching problem but cannot seem to find an elegant solution. Thank you in advance!