Speeding up Numpy array slicing

Question

I need to slice a moderately sized 2d Numpy array along two dimensions. As example,

import numpy as np
X = np.random.normal(loc=0, scale=1, size=(3000, 100))

From this array, I need to select a large number of rows and a rather small number of columns, e.g.

row_idx = np.random.random_integers(0, 2999, 2500)
col_idx = np.random.random_integers(0, 99, 10)

Right now, I do this by the following command:

X.take(col_idx, axis=1).take(row_idx, axis=0)

This takes approximately 115µs on my computer. The problem is that I need to do this step several million times per run.

Do you see any chance to speed this up?

Edit (additional information): I have a matrix X which is nxk. The n rows contain 1xk vectors. There are three sets: an active set (V), a left set (L) and a right set (R). Moreover, there are coefficients v0 and v. I need to compute this quantity: http://latex.codecogs.com/gif.latex?%5Cleft(1-%5Ctau+%5Cright+)%5Csum_{i%5Cin+L}%5Cleft(%5Cnu_0+%5Csum_{j%5Cin+V}%5Cnu_j+x_{ij}+%5Cright+)+-+%5Ctau%5Csum_{i%5Cin+R}%5Cleft(%5Cnu_0+%5Csum_{j%5Cin+V}%5Cnu_j+x_{ij}+%5Cright+) (sorry, I can't post images). The formula from the question selects all rows of X which are in the left (right) set and all columns that are in the active set.

Edit 2

I found another small improvement.

X.take(col_idx, axis=1, mode='clip').take(row_idx, axis=0, mode='clip')

is a little faster (roughly 25% on my machine).

The take() method will need to copy the selected rows an columns. You should asjust your algorithm to make that unnecesary. We can't tell you how to do that without further context. — Sven Marnach
– Sven Marnach, Commented Mar 13, 2014 at 14:41
The row indices stay constant for a couple of dozen observations (more precisely: I have k variables which are divided into an active and an inactive set and I need to check which variable of the inactive set fits best -- i.e. the row indices stay constant as long as I am checking the variables in the inactive set) — BayerSe
– BayerSe, Commented Mar 13, 2014 at 15:07

Daniel · Accepted Answer · 2014-03-13 15:25:29Z

1

Lets do something where we make a 1D array of indices that satisfy our conditions for a n-dimensional grid.

def make_multi_index(arr, *inds):
    tmp = np.meshgrid(*inds, indexing='ij')
    idx = np.vstack([x.ravel() for x in tmp])
    return np.ravel_multi_index(idx, X.shape)

Using your test arrays and the original case for reference:

%timeit X.take(col_idx, axis=1).take(row_idx, axis=0)
10000 loops, best of 3: 95.4 µs per loop

Lets use this functional to build indices, hold them, and then use take to return your desired output:

inds = make_multi_index(X, row_idx, col_idx)
tmp = np.take(X,inds).reshape(row_idx.shape[0], col_idx.shape[0])

np.allclose(tmp, X.take(col_idx, axis=1).take(row_idx, axis=0))
Out[128]: True

So building our indices and keeping them appears to work, now for timings:

%timeit make_multi_index(X, row_idx, col_idx)
1000 loops, best of 3: 356 µs per loop

%timeit np.take(X,inds).reshape(row_idx.shape[0], col_idx.shape[0])
10000 loops, best of 3: 59.9 µs per loop

So as it happens not terribly great- this probably gets better with more dimension that you would like to take from. Anyhow if you keep these indices for more then 10-15 iterations it could help some or if you add an additional dimension and take all inactive datasets simultaneously.

answered Mar 13, 2014 at 15:25

Daniel

19.6k7 gold badges64 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

BayerSe Over a year ago

That sounds promising. I see if I can make use of this approach, maybe there is some way of minimizing the number of index changes. Thank you!

perimosocordiae · Accepted Answer · 2014-03-13 14:37:06Z

0

You could use two-dimensional fancy indexing:

X[row_idx,col_idx[:,None]]

But that takes ~1ms on my machine, vs ~300us using your method.

It seems like your method is the best you can do, unless you have additional information about the values in row_idx and col_idx.

answered Mar 13, 2014 at 14:37

perimosocordiae

17.9k14 gold badges64 silver badges76 bronze badges

Collectives™ on Stack Overflow

Speeding up Numpy array slicing

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related