High RAM usage when using Datashader with dasked xarray

Question

I have a dasked xarray which is about 150k x 90k with chunk size of 8192 x 8192. I am working on a Window virtual machine which has 100gb RAM and 16 cores.

I want to plot it using the Datashader package. From reading the docs, Datashader does support dasked xarray. I am a beginner in using Dask, though my (limited) understanding is that since the xarray is chuncked, Datashader should only need to read one chunk at a time (or something to that effect) and do the required processing to produce the plot, so it doesn't need to read the entire xarray in-memory. Moreover, it should be able to leverage parallelism to speed up this process.

However, when I try to create a plot after about 15 seconds I see the RAM starts increasing steadily (and CPU > 95%) until I eventually manually interrupt the code when it reaches about 90gb in RAM. After interrupting, below is a part of the error message.

> KeyboardInterrupt                         Traceback (most recent call
> last) Cell In[4], line 1
> ----> 1 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))
> 
> File
> <PATH>\lib\site-packages\datashader\transfer_functions\__init__.py:717,
> in shade(agg, cmap, color_key, how, alpha, min_alpha, span, name,
> color_baseline, rescale_discrete_levels)
>     713         return _apply_discrete_colorkey(
>     714             agg, color_key, alpha, name, color_baseline
>     715         )
>     716     else:
> --> 717         return _interpolate(agg, cmap, how, alpha, span, min_alpha, name,
>     718                             rescale_discrete_levels)
>     719 elif agg.ndim == 3:
>     720     return _colorize(agg, color_key, how, alpha, span, min_alpha, name, color_baseline,
>     721                      rescale_discrete_levels)
> 
> File
> <PATH>\lib\site-packages\datashader\transfer_functions\__init__.py:256,
> in _interpolate(agg, cmap, how, alpha, span, min_alpha, name,
> rescale_discrete_levels)
>     254 data = agg.data
>     255 if isinstance(data, da.Array):
> --> 256     data = data.compute()
>     257 else:
>     258     data = data.copy()

From the above error message, it seems like compute is being run on the xarray which would perhaps explain why the RAM is increasing so much? If that is the case, I am confused why compute is being run since I thought the whole point of using dasked xarrays is to avoid reading the entire xarray. I would have thought it should have been possible to even create this plot using a regular computer with 16gb RAM, with the main limiting factor being the chunk size and some extra space for dask/datashader to work in the background.

I have almost certainly mis-understood some things. It would be much appreciated if anyone could explain what is happening here and how I can go about plotting the array that I have.

Minimal reproducible example

# import libraries
import numpy as np
import dask.array as da
import datashader as ds
from datashader import transfer_functions as tf
import xarray as xr

# create large dask array
dask_array = da.random.random((100000, 100000), chunks=(1000, 1000))
# convert to dasked xarray
dask_xarray = xr.DataArray(
    dask_array,
    dims=["x", "y"], 
    coords={"x": np.arange(100000), "y": np.arange(100000)},
    name="example_data"  # Name of the data variable
)
# create plot using Datashader
tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))

James A. Bednar · Accepted Answer · 2025-09-04 19:17:30Z

1

Can you try https://github.com/holoviz/datashader/pull/1448 to see if it fixes it for you?

answered Sep 4 at 19:17

James A. Bednar

3,2901 gold badge12 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nanoputian Sep 5 at 5:18

Thanks for the link to the pull request. I tried running the code again but I am now getting a different error relating to the @ngjit decorate. I have added a comment in the link with the full error message.

Nanoputian Sep 6 at 0:44

It seems like this ngjit decorator occurs when using the latest version of the Datashader package. When I use version 0.16.1 which is the one I was original using, then the change from the link fixes the issue.

Collectives™ on Stack Overflow

High RAM usage when using Datashader with dasked xarray

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related