How do I use Numpy views when doing scalable statistics bootstrapping

I have a large dataset that I process using xarray+dask for scalability. These libraries work great for all of my calculations, except for one. The final step is to perform some statistics bootstrapping (on the largest dimension) and then calculate a variance over it.

The way I do it looks like this:

idx = xr.DataArray(
    np.random.randint(0, projections.n.size, (sample_count, sample_size)),
    dims=("sample", "n"),
)

bootstrapped_variations = xr.concat(
    [projections.isel(n=i).var(dim="n").sum(dim="ReIm") for i in idx], dim="sample"
).chunk("auto")

This work for some sample sizes, but does not scale to larger ones and I get out of memory errors. I guess the main problem is creating so many new arrays when calling isel. The thing is that they should get reduced immediately, since we calculate their variance.

What I would like to do, is to create a numpy view of such a single sample, so as to not allocate new huge arrays in memory. Here I use advanced indexing, so a copy is returned.

It can be slower computationally, but I just wonder if it can be done. Perhaps there are some other efficient ways to do bootstrapping on large datasets? From what I've read np.random.choice also returns a copy, not a view.

asked Apr 2, 2023 at 10:25

krokosik

1478 bronze badges

1

You should consider posting this question in discourse.pangeo.io.

Guillaume EB
– Guillaume EB

2023-04-07 05:57:03 +00:00
Commented Apr 7, 2023 at 5:57

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How do I use Numpy views when doing scalable statistics bootstrapping

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest