1

I'm interested in making a distributed dask array out of a bunch of netcdf files I have lying around. I started down the path outlined in "Distributed Dask arrays" but have gotten a little caught up by the deprecation of 'distributed.collections'

What is the best way to create a distributed dask array now? I have my dask-scheduler and dask-worker tasks running. And I can successfully execute the following:

from distributed import Client, progress
client = Client('scheduler-address:8786')
futures = client.map(load_netcdf, filenames)
progress(futures)

What next?

1 Answer 1

2

Use XArray

First, if you have many NetCDF files, then you should take a long look at the XArray package, which wraps Dask.array and manages all of the NetCDF metadata conventions.

http://xarray.pydata.org/en/stable/

In particular I think that you want the open_mfdataset function.

By Hand

If you want to build a dask.array by hand using techniques in that blogpost then you should use the dask.delayed interface and the da.from_delayed function.

http://dask.pydata.org/en/latest/array-creation.html#using-dask-delayed

If you want to use Futures as in that blogpost that's fine, da.from_delayed will accept Futures in place of delayed values.

array_chunks = [da.from_delayed(future, shape=..., dtype=...) 
                for future in futures]
array = da.concatenate(array_chunks, axis=0)
Sign up to request clarification or add additional context in comments.

2 Comments

do i still need to launch dask-scheduler and dask-workers to get XArray to leverage dask? reading through xarray.pydata.org/en/stable/dask.html doesn't make this exactly clear in my mind.
There is no difference in startup between XArray and Dask. You can use the dask.distributed scheduler or the single-machine schedulers. dask.pydata.org/en/latest/scheduler-choice.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.