6

I have a 2D numpy array S representing a state space, with 80000000 rows (as states) and 5 columns (as state variables).

I initialize K0 with S, and at each iteration, I apply a state transition function f(x) on all of the states in Ki, and delete states whose f(x) is not in Ki, resulting Ki+1. Until it converges i.e. Ki+1 = Ki.

Going like this would take ages:

K = S
to_delete = [0]
While to_delete:
    to_delete = []
    for i in xrange(len(K)):
        if not f(i) in K:
        to_delete.append(K(i))
    K = delete(K,to_delete,0)

So I wanted to make a vectorized implementation :

slice K in columns, apply f and, join them once again, thus obtaining f(K) somehow.

The question now is how to get an array of length len(K), say Sel, where each row Sel[i] determine whether f(K[i]) is in K. Exactly like the function in1d works.

Then it would be simple to make

K=K[Sel]]

3 Answers 3

5

Your question is difficult to understand because it contains extraneous information and contains typos. If I understand correctly, you simply want an efficient way to perform a set operation on the rows of a 2D array (in this case the intersection of the rows of K and f(K)).

You can do this with numpy.in1d if you create structured array view.

Code:

if this is K:

In [50]: k
Out[50]:
array([[6, 6],
       [3, 7],
       [7, 5],
       [7, 3],
       [1, 3],
       [1, 5],
       [7, 6],
       [3, 8],
       [6, 1],
       [6, 0]])

and this is f(K) (for this example I subtract 1 from the first col and add 1 to the second):

In [51]: k2
Out[51]:
array([[5, 7],
       [2, 8],
       [6, 6],
       [6, 4],
       [0, 4],
       [0, 6],
       [6, 7],
       [2, 9],
       [5, 2],
       [5, 1]])

then you can find all rows in K also found in f(K) by doing something this:

In [55]: k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]
Out[55]: array([[6, 6]])

view and reshape create flat structured views so that each row appears as a single element to in1d. in1d creates a boolean index of k of matched items which is used to fancy index k and return the filtered array.

Sign up to request clarification or add additional context in comments.

4 Comments

Great! but what is dtype = 'i,i' ? Could this method work if my rows are of any type and any number of columns? for instance rows like [-0.5,0.5,1,-6,20].
@amine23 I've put a link to structured array documentation in my answer. Yes, floats, strings, booleans are all allowed fields in a dtype struct.
I get ValueError: total size of new array must be unchanged if I try your example. Haven't sorted out what's changed to break it (I have numpy 1.8.2, python 2.7.6).
Great solution. I had to use dtype='int,int' in my case. Otherwise the structured array was 2x the length of the original [N,2] array.
1

Not sure if I understand your question entirely, but if the interpretation of Paul is correct, it can be solved efficiently and fully vectorized using the numpy_indexed package as such in a single readable line:

import numpy_indexed as npi
K = npi.intersection(K, f(K))

Also, this works for rows of any type or shape.

Comments

0

Above answer is great.

But if one doesn't want to mingle with structured arrays and wants a solution that doesn't care what is the type of your array, nor the dimensions of your array elements, I came up with this :

k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]

basically, list(map(np.ndarray.dumps, k))instead of k.view(dtype='f8,f8').reshape(k.shape[0]).

Take into account that this solution is ~50 times slower.

k = np.array([[6.5, 6.5],
       [3.5, 7.5],
       [7.5, 5.5],
       [7.5, 3.5],
       [1.5, 3.5],
       [1.5, 5.5],
       [7.5, 6.5],
       [3.5, 8.5],
       [6.5, 1.5],
       [6.5, 0.5]])
k = np.tile(k, (1000, 1))

k2 = np.c_[k[:, 0] - 1, k[:, 1] + 1]


In [132]: k.shape, k2.shape
Out[132]: ((10000, 2), (10000, 2))

In [133]: timeit k[np.in1d(k.view(dtype='f8,f8').reshape(k.shape[0]),k2.view(dtype='f8,f8').reshape(k2.shape[0]))]
10 loops, best of 3: 22.2 ms per loop

In [134]: timeit k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]
1 loop, best of 3: 892 ms per loop

It can be marginal for small inputs, but for the op's, it will take 1h 20min instead of 2 min.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.