0

I have a dasked xarray which is about 150k x 90k with chunk size of 8192 x 8192. I am working on a Window virtual machine which has 100gb RAM and 16 cores.

I want to plot it using the Datashader package. From reading the docs, Datashader does support dasked xarray. I am a beginner in using Dask, though my (limited) understanding is that since the xarray is chuncked, Datashader should only need to read one chunk at a time (or something to that effect) and do the required processing to produce the plot, so it doesn't need to read the entire xarray in-memory. Moreover, it should be able to leverage parallelism to speed up this process.

However, when I try to create a plot after about 15 seconds I see the RAM starts increasing steadily (and CPU > 95%) until I eventually manually interrupt the code when it reaches about 90gb in RAM. After interrupting, below is a part of the error message.

> KeyboardInterrupt                         Traceback (most recent call
> last) Cell In[4], line 1
> ----> 1 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))
> 
> File
> <PATH>\lib\site-packages\datashader\transfer_functions\__init__.py:717,
> in shade(agg, cmap, color_key, how, alpha, min_alpha, span, name,
> color_baseline, rescale_discrete_levels)
>     713         return _apply_discrete_colorkey(
>     714             agg, color_key, alpha, name, color_baseline
>     715         )
>     716     else:
> --> 717         return _interpolate(agg, cmap, how, alpha, span, min_alpha, name,
>     718                             rescale_discrete_levels)
>     719 elif agg.ndim == 3:
>     720     return _colorize(agg, color_key, how, alpha, span, min_alpha, name, color_baseline,
>     721                      rescale_discrete_levels)
> 
> File
> <PATH>\lib\site-packages\datashader\transfer_functions\__init__.py:256,
> in _interpolate(agg, cmap, how, alpha, span, min_alpha, name,
> rescale_discrete_levels)
>     254 data = agg.data
>     255 if isinstance(data, da.Array):
> --> 256     data = data.compute()
>     257 else:
>     258     data = data.copy()

From the above error message, it seems like compute is being run on the xarray which would perhaps explain why the RAM is increasing so much? If that is the case, I am confused why compute is being run since I thought the whole point of using dasked xarrays is to avoid reading the entire xarray. I would have thought it should have been possible to even create this plot using a regular computer with 16gb RAM, with the main limiting factor being the chunk size and some extra space for dask/datashader to work in the background.

I have almost certainly mis-understood some things. It would be much appreciated if anyone could explain what is happening here and how I can go about plotting the array that I have.

Minimal reproducible example

# import libraries
import numpy as np
import dask.array as da
import datashader as ds
from datashader import transfer_functions as tf
import xarray as xr

# create large dask array
dask_array = da.random.random((100000, 100000), chunks=(1000, 1000))
# convert to dasked xarray
dask_xarray = xr.DataArray(
    dask_array,
    dims=["x", "y"], 
    coords={"x": np.arange(100000), "y": np.arange(100000)},
    name="example_data"  # Name of the data variable
)
# create plot using Datashader
tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))

1 Answer 1

1

Can you try https://github.com/holoviz/datashader/pull/1448 to see if it fixes it for you?

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the link to the pull request. I tried running the code again but I am now getting a different error relating to the @ngjit decorate. I have added a comment in the link with the full error message.
It seems like this ngjit decorator occurs when using the latest version of the Datashader package. When I use version 0.16.1 which is the one I was original using, then the change from the link fixes the issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.