Is it possible to load partial chunks of a DataArray (stored as single netcdf file) from disk into memory (i.e. not load the whole data-array at once) but without using dask-dataarrays?
The issue is that I'm using dask as my cluster scheduler to submit jobs and within those jobs - I want to page a dataarray into memory from disk in small pieces. Dask unfortunately does not like nested dask-schedulers so trying to load that dataarray as per
da = xr.open_datarray( file, chunks={'time':1000} ) doesn't work (causes dask to throw nested daemonic process errors).
Ideally, I'd like to do something like this - without having the whole dataarray loaded into memory, but only the relevant pieces:
da = xr.open_datarray( my_file ) # lazy open the file
for t in range( 0, len( da ), 1000 ) :
da_actual = da[t:t+1000].load() # materialize the data into memory
# do some compute with da_actual
Any pointers / ideas on how to achieve this would be appreciated