I need to slice a moderately sized 2d Numpy array along two dimensions. As example,
import numpy as np
X = np.random.normal(loc=0, scale=1, size=(3000, 100))
From this array, I need to select a large number of rows and a rather small number of columns, e.g.
row_idx = np.random.random_integers(0, 2999, 2500)
col_idx = np.random.random_integers(0, 99, 10)
Right now, I do this by the following command:
X.take(col_idx, axis=1).take(row_idx, axis=0)
This takes approximately 115µs on my computer. The problem is that I need to do this step several million times per run.
Do you see any chance to speed this up?
Edit (additional information): I have a matrix X which is nxk. The n rows contain 1xk vectors. There are three sets: an active set (V), a left set (L) and a right set (R). Moreover, there are coefficients v0 and v. I need to compute this quantity: http://latex.codecogs.com/gif.latex?%5Cleft(1-%5Ctau+%5Cright+)%5Csum_{i%5Cin+L}%5Cleft(%5Cnu_0+%5Csum_{j%5Cin+V}%5Cnu_j+x_{ij}+%5Cright+)+-+%5Ctau%5Csum_{i%5Cin+R}%5Cleft(%5Cnu_0+%5Csum_{j%5Cin+V}%5Cnu_j+x_{ij}+%5Cright+) (sorry, I can't post images). The formula from the question selects all rows of X which are in the left (right) set and all columns that are in the active set.
Edit 2
I found another small improvement.
X.take(col_idx, axis=1, mode='clip').take(row_idx, axis=0, mode='clip')
is a little faster (roughly 25% on my machine).
take()method will need to copy the selected rows an columns. You should asjust your algorithm to make that unnecesary. We can't tell you how to do that without further context.