More efficient way to access rows based on a list of indices in 2d numpy array?

Question

So I have 2d numpay array arr. It's a relatively big one: arr.shape = (2400, 60000)

What I'm currently doing is the following:

randomly (with replacement) select arr.shape[0] indices
access (row-wise) chosen indices of arr
calculating column-wise averages and selecting max value
I'm repeating it for k times

It looks sth like:

no_rows = arr.shape[0]
indicies = np.array(range(no_rows))
my_vals = []
for k in range(no_samples):
    random_idxs = np.random.choice(indicies, size=no_rows, replace=True)
    my_vals.append(
        arr[random_idxs].mean(axis=0).max()
    )

My problem is that is very slow. With my arr size, it takes ~3s for 1 loop. As I want a sample that is bigger than 1k - my current solution solution pretty bad (1k*~3s -> ~1h). I've profiled it and the bottleneck is accessing row based on indices. "mean" and "max" work fast. np.random.choice is also ok.

Do you see any area for improvement? A more efficient way of accessing indices or maybe better a faster approach that solves the problem without this?

What I tried so far:

numpy.take (slower)
numpy.ravel :

sth similar to:

random_idxs = np.random.choice(sample_idxs, size=sample_size, replace=True) 
test = random_idxs.ravel()[arr.ravel()].reshape(arr.shape)

similar approach to current one but without loop. I created 3d arr and accessed rows across additional dimension in one go

no_samples is just integer defining sample size. I want it to be >1k — Slaw
– Slaw, Commented Jul 30, 2020 at 9:05

yao99 · Accepted Answer · 2020-07-31 14:27:42Z

2

Since advanced indexing will generate a copy, the program will allocate huge memory in arr[random_idxs].

So one of the most simple way to improve efficiency is that do things batch wise.

BATCH = 512
max(arr[random_idxs,i:i+BATCH].mean(axis=0).max() for i in range(0,arr.shape[1],BATCH))

answered Jul 31, 2020 at 14:27

yao99

9085 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Slaw Over a year ago

Thank you - that indeed helps! I've managed to speed it up ~3.5x times with a batch approach. Single iteration got reduced from ~3s to ~0.79. I'd add only that it's good to play with appropriate batch size. I've run several tests. Here are results (BATCH, avg_time_in_100_iterations): (16, 1.13), (32, 0.87), (64, 0.79), (128, 1.2), (256, 1.33), (512, 1.8) and (1024, 2.47)

David Hoffman · Accepted Answer · 2020-07-30 17:50:33Z

0

This is not a general solution to the problem, but should make your specific problem much faster. Basically, arr.mean(axis=0).max() won't change, so why not take random samples from that array?

Something like:

mean_max = arr.mean(axis=0).max()
my_vals = np.array([np.random.choice(mean_max, size=len(mean_max), replace=True) for i in range(no_samples)])

You may even be able to do: my_vals = np.random.choice(mean_max, size=(no_samples, len(mean_max)), replace=True), but I'm not sure how, if at all, that would change your statistics.

answered Jul 30, 2020 at 17:50

David Hoffman

2,3931 gold badge18 silver badges34 bronze badges

1 Comment

Slaw Over a year ago

Not sure if I follow. arr.mean(axis=0).max() WILL change - that's the whole point. I'm choosing random indices with replacement. So every iteration, means for each column should be different. Then I'm choosing the highest value from those columns means and appending it as a data point for my final list. As for your solution - it won't run. mean_max you proposed is a single float number. That's just the highest mean value from arr (where means are calculated column-wise). So you try to randomly choose a single value from a single value available.

Collectives™ on Stack Overflow

More efficient way to access rows based on a list of indices in 2d numpy array?

It looks sth like:

sth similar to:

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

It looks sth like:

sth similar to:

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related