Creating distributed dask arrays

Question

I'm interested in making a distributed dask array out of a bunch of netcdf files I have lying around. I started down the path outlined in "Distributed Dask arrays" but have gotten a little caught up by the deprecation of 'distributed.collections'

What is the best way to create a distributed dask array now? I have my dask-scheduler and dask-worker tasks running. And I can successfully execute the following:

from distributed import Client, progress
client = Client('scheduler-address:8786')
futures = client.map(load_netcdf, filenames)
progress(futures)

What next?

MRocklin · Accepted Answer · 2017-09-15 18:54:18Z

2

Use XArray

First, if you have many NetCDF files, then you should take a long look at the XArray package, which wraps Dask.array and manages all of the NetCDF metadata conventions.

http://xarray.pydata.org/en/stable/

In particular I think that you want the open_mfdataset function.

By Hand

If you want to build a dask.array by hand using techniques in that blogpost then you should use the dask.delayed interface and the da.from_delayed function.

http://dask.pydata.org/en/latest/array-creation.html#using-dask-delayed

If you want to use Futures as in that blogpost that's fine, da.from_delayed will accept Futures in place of delayed values.

array_chunks = [da.from_delayed(future, shape=..., dtype=...) 
                for future in futures]
array = da.concatenate(array_chunks, axis=0)

answered Sep 15, 2017 at 18:54

MRocklin

57.5k29 gold badges176 silver badges245 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

badgley Over a year ago

do i still need to launch dask-scheduler and dask-workers to get XArray to leverage dask? reading through xarray.pydata.org/en/stable/dask.html doesn't make this exactly clear in my mind.

MRocklin Over a year ago

There is no difference in startup between XArray and Dask. You can use the dask.distributed scheduler or the single-machine schedulers. dask.pydata.org/en/latest/scheduler-choice.html

Collectives™ on Stack Overflow

Creating distributed dask arrays

1 Answer 1

Use XArray

By Hand

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Use XArray

By Hand

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related