Implementation of numpy in1d for 2D arrays?

Question

I have a 2D numpy array S representing a state space, with 80000000 rows (as states) and 5 columns (as state variables).

I initialize K0 with S, and at each iteration, I apply a state transition function f(x) on all of the states in Ki, and delete states whose f(x) is not in Ki, resulting Ki+1. Until it converges i.e. Ki+1 = Ki.

Going like this would take ages:

K = S
to_delete = [0]
While to_delete:
    to_delete = []
    for i in xrange(len(K)):
        if not f(i) in K:
        to_delete.append(K(i))
    K = delete(K,to_delete,0)

So I wanted to make a vectorized implementation :

slice K in columns, apply f and, join them once again, thus obtaining f(K) somehow.

The question now is how to get an array of length len(K), say Sel, where each row Sel[i] determine whether f(K[i]) is in K. Exactly like the function in1d works.

Then it would be simple to make

K=K[Sel]]

Paul · Accepted Answer · 2013-04-26 17:22:05Z

5

Your question is difficult to understand because it contains extraneous information and contains typos. If I understand correctly, you simply want an efficient way to perform a set operation on the rows of a 2D array (in this case the intersection of the rows of K and f(K)).

You can do this with numpy.in1d if you create structured array view.

Code:

if this is K:

In [50]: k
Out[50]:
array([[6, 6],
       [3, 7],
       [7, 5],
       [7, 3],
       [1, 3],
       [1, 5],
       [7, 6],
       [3, 8],
       [6, 1],
       [6, 0]])

and this is f(K) (for this example I subtract 1 from the first col and add 1 to the second):

In [51]: k2
Out[51]:
array([[5, 7],
       [2, 8],
       [6, 6],
       [6, 4],
       [0, 4],
       [0, 6],
       [6, 7],
       [2, 9],
       [5, 2],
       [5, 1]])

then you can find all rows in K also found in f(K) by doing something this:

In [55]: k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]
Out[55]: array([[6, 6]])

view and reshape create flat structured views so that each row appears as a single element to in1d. in1d creates a boolean index of k of matched items which is used to fancy index k and return the filtered array.

edited Apr 26, 2013 at 17:22

answered Apr 26, 2013 at 16:21

Paul

43.9k17 gold badges112 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

amine23 Over a year ago

Great! but what is dtype = 'i,i' ? Could this method work if my rows are of any type and any number of columns? for instance rows like [-0.5,0.5,1,-6,20].

Paul Over a year ago

@amine23 I've put a link to structured array documentation in my answer. Yes, floats, strings, booleans are all allowed fields in a dtype struct.

Kyle Over a year ago

I get ValueError: total size of new array must be unchanged if I try your example. Haven't sorted out what's changed to break it (I have numpy 1.8.2, python 2.7.6).

Andrew Over a year ago

Great solution. I had to use dtype='int,int' in my case. Otherwise the structured array was 2x the length of the original [N,2] array.

Eelco Hoogendoorn · Accepted Answer · 2016-07-27 21:12:45Z

1

Not sure if I understand your question entirely, but if the interpretation of Paul is correct, it can be solved efficiently and fully vectorized using the numpy_indexed package as such in a single readable line:

import numpy_indexed as npi
K = npi.intersection(K, f(K))

Also, this works for rows of any type or shape.

answered Jul 27, 2016 at 21:12

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Comments

Louis R · Accepted Answer · 2016-07-27 20:25:56Z

Above answer is great.

But if one doesn't want to mingle with structured arrays and wants a solution that doesn't care what is the type of your array, nor the dimensions of your array elements, I came up with this :

k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]

basically, list(map(np.ndarray.dumps, k))instead of k.view(dtype='f8,f8').reshape(k.shape[0]).

Take into account that this solution is ~50 times slower.

k = np.array([[6.5, 6.5],
       [3.5, 7.5],
       [7.5, 5.5],
       [7.5, 3.5],
       [1.5, 3.5],
       [1.5, 5.5],
       [7.5, 6.5],
       [3.5, 8.5],
       [6.5, 1.5],
       [6.5, 0.5]])
k = np.tile(k, (1000, 1))

k2 = np.c_[k[:, 0] - 1, k[:, 1] + 1]


In [132]: k.shape, k2.shape
Out[132]: ((10000, 2), (10000, 2))

In [133]: timeit k[np.in1d(k.view(dtype='f8,f8').reshape(k.shape[0]),k2.view(dtype='f8,f8').reshape(k2.shape[0]))]
10 loops, best of 3: 22.2 ms per loop

In [134]: timeit k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]
1 loop, best of 3: 892 ms per loop

It can be marginal for small inputs, but for the op's, it will take 1h 20min instead of 2 min.

Collectives™ on Stack Overflow

Implementation of numpy in1d for 2D arrays?

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related