Indices in a numpy array where slice in another array

Question

The actual problem is in some machine learning application, and the data gets a little complex. So here's an MWE that captures the essence of the problem:

I have two arrays made as follows:

L = np.arange(12).reshape(4,3)
M = np.arange(12).reshape(6,2)

Now, I want to find the rows R in L, such that there exists some row in M that is made up of all the elements in R except the last one.

From the above example code, L and M look like this:

array([[ 0,  1,  2],  # L
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

array([[ 0,  1],  # M
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

I would like from these, the marked rows in L, as a numpy array:

array([[ 0,  1,  2],
       [ 6,  7,  8]])

If I were representing L and M as python lists, I would have done this:

L = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
M = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11]]
answer = [R for R in L if R[:-1] in M]

Now, I know that I can use a similar list comprehension in numpy and cast the result into an array, numpy being as awesome as it is, probably has a more elegant way to do this that I don't know about.

I tried looking into np.where (to get the required indices, which I can then subscipt L with), but that doesn't seem to do what I need.

I'd appreciate any help

If a row of M contains the first two elements of a row of L, it can't contain the last element, because it's out of room. Will this also be true in your actual application? — user2357112
– user2357112, Commented Mar 22, 2014 at 21:42
@user2357112: absolutely true. This is why I am testing for "some row in M that is made up of all the elements in R except the last one". The last element of a row in L is an added dimension based on some extra computation — inspectorG4dget
– inspectorG4dget, Commented Mar 22, 2014 at 21:45
When I see problems like this, I wish NumPy's set operations weren't all 1D-only. I think you might be able to do something by putting together several sort and in1d calls. The idea is vague right now, though. — user2357112
– user2357112, Commented Mar 22, 2014 at 21:51
@user2357112: that's beyond my numpy-fu. I'd appreciate an example — inspectorG4dget
– inspectorG4dget, Commented Mar 22, 2014 at 21:53

Bitwise · Accepted Answer · 2014-03-23 00:29:15Z

4

Ok, I think I got it. The trick is to add another dimension to M, and then you can use broadcasting:

M.shape += (1,)
E = np.all(L[:,:-1].T == M, 1)

and you get a 6x4 boolean matrix E that gives you the results of comparing all rows of L with all rows of M.

From here it is easy to finish:

result = L[np.any(E,0)]

This way the solution is streamlined and you don't need any lambda functions or "implicit loops" (e.g. np.apply_along_axis()).

Yes, numpy vectorization is beautiful (but you have to think quite abstract sometimes)...

edited Mar 23, 2014 at 0:29

answered Mar 22, 2014 at 22:19

Bitwise

7,8357 gold badges39 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

inspectorG4dget Over a year ago

Either I don't get this, or it doesn't work. Could you please implement the easy finish to show me how I could get the desired values?

Bitwise Over a year ago

Ok I made it explicit - I usually try not to give an answer that people can just copy and paste.

ddelemeny Over a year ago

Awesome answer ! I updated mine with a detailed explanation of the array broadcast magic. Thanks for increasing my numpy-fu ! :)

ddelemeny · Accepted Answer · 2014-03-23 00:38:16Z

Quite similar to Bitwise's answer :

def fn(a):
    return lambda b: np.all(a==b, axis=1)
matches = np.apply_along_axis(fn(M), 1, L[:,:2])
result = L[np.any(matches, axis=1)]

What happens under the hood is something like this (I'll use Bitwise's example, which is easier to demonstrate) :

>>> M
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> M.shape+=(1,)
>>> M
array([[[ 0],
        [ 1]],

       [[ 2],
        [ 3]],

       [[ 4],
        [ 5]],

       [[ 6],
        [ 7]],

       [[ 8],
        [ 9]],

       [[10],
        [11]]])

Here we have added another dimension to the M array, which is now (6,2,1).

>>> L2 = L[:,:-1].T

Then we get rid of the last column of 2, and transpose the array, so that the dimension is (2,4)

And here is the magic, M and L2 are now broadcastable to arrays of dimension (6,2,4).

As numpy's doc states :

A set of arrays is called “broadcastable” to the same shape if the above rules produce a valid result, i.e., one of the following is true:
The arrays all have exactly the same shape.
The arrays all have the same number of dimensions and the length of each dimensions is either a common length or 1.
The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property 2.
Example

If a.shape is (5,1), b.shape is (1,6), c.shape is (6,) and d.shape is () so that d is a scalar, then a, b, c, and d are all broadcastable to dimension (5,6); and
a acts like a (5,6) array where a[:,0] is broadcast to the other columns,
b acts like a (5,6) array where b[0,:] is broadcast to the other rows,
c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally,
d acts like a (5,6) array where the single value is repeated.

M[:,:,0] will be repeated 4 times to fill the 3 dim, and L2 will be prepended a new dimension and be repeated 6 times to fill it.

>>> B = np.broadcast_arrays(L2,M)
>>> B
[array([[[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]]]),


array([[[ 0,  0,  0,  0],
        [ 1,  1,  1,  1]],

       [[ 2,  2,  2,  2],
        [ 3,  3,  3,  3]],

       [[ 4,  4,  4,  4],
        [ 5,  5,  5,  5]],

       [[ 6,  6,  6,  6],
        [ 7,  7,  7,  7]],

       [[ 8,  8,  8,  8],
        [ 9,  9,  9,  9]],

       [[10, 10, 10, 10],
        [11, 11, 11, 11]]])]

We can now compare them element-wise :

>>> np.equal(*B)
array([[[ True, False, False, False],
        [ True, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False,  True, False],
        [False, False,  True, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]]], dtype=bool)

Row to row (axis = 1):

>>> np.all(np.equal(*B), axis=1)
array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False,  True, False],
       [False, False, False, False],
       [False, False, False, False]], dtype=bool)

Aggregate on L's :

>>> C = np.any(np.all(np.equal(*B), axis=1), axis=0)
>>> C
array([ True, False,  True, False], dtype=bool)

And this gives you the boolean mask to apply to L.

>>> L[C]
array([[0, 1, 2],
       [6, 7, 8]])

apply_along_axis will leverage the same feature, but reducing L's dimension instead of increasing M's (thus adding implicit loops).

Thank you for such a detailed post. I'm going to stay with behzad.nouri's answer, just for its simplicity. But I really like yours for how the amount of detail and teaching me a few more things. I'm bookmarking this for future reference

behzad.nouri · Accepted Answer · 2014-03-22 22:45:04Z

0

>>> import hashlib
>>> fn = lambda xs: hashlib.sha1(xs).hexdigest()
>>> m = np.apply_along_axis(fn, 1, M)
>>> l = np.apply_along_axis(fn, 1, L[:,:-1])
>>> L[np.in1d(l, m)]
array([[0, 1, 2],
       [6, 7, 8]])

edited Mar 22, 2014 at 22:45

answered Mar 22, 2014 at 22:29

behzad.nouri

78.5k18 gold badges130 silver badges127 bronze badges

2 Comments

inspectorG4dget Over a year ago

Could you add an explanation please? I don't fully understand the solution

behzad.nouri Over a year ago

@inspectorG4dget since in1d only works on 1 dimensional arrays, i am hashing the rows, and then apply in1d.

wwii · Accepted Answer · 2014-03-23 18:22:28Z

0

>>> print np.array([row for row in L if row[:-1] in M])
[[0 1 2]
 [6 7 8]]

edited Mar 23, 2014 at 18:22

answered Mar 22, 2014 at 23:05

wwii

23.9k7 gold badges42 silver badges81 bronze badges

Collectives™ on Stack Overflow

Indices in a numpy array where slice in another array

4 Answers 4

3 Comments

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related