0

How would I go about creating and return a file new_zarr.zarr from a xarray Dataset?

I know xarray.Dataset.to_zarr() exists but this returns a ZarrStore and I must return a bytes-like object.

I have tried using the tempfile module but am unsure how to proceed, how would I write an xarray.Dataset to a bytes-like object that reurns a .zarr file that can be downloaded?

1 Answer 1

3

Zarr supports multiple storage backends (DirectoryStore, ZipStore, etc.). If you are looking for a single file object, it sounds like the ZipStore is what you want.

import xarray as xr
import zarr

ds = xr.tutorial.open_dataset('air_temperature')
store = zarr.storage.ZipStore('./new_zarr.zip')
ds.to_zarr(store)

The zip file can be thought of as a single file zarr store and can be downloaded (or moved around as a single store).


Update 1

If you want to do this all in memory, you could extend zarr.ZipStore to allow passing in a BytesIO object:

class MyZipStore(zarr.ZipStore):
    
    def __init__(self, path, compression=zipfile.ZIP_STORED, allowZip64=True, mode='a',
                 dimension_separator=None):

        # store properties
        if isinstance(path, str):  # this is the only change needed to make this work
            path = os.path.abspath(path)
        self.path = path
        self.compression = compression
        self.allowZip64 = allowZip64
        self.mode = mode
        self._dimension_separator = dimension_separator

        # Current understanding is that zipfile module in stdlib is not thread-safe,
        # and so locking is required for both read and write. However, this has not
        # been investigated in detail, perhaps no lock is needed if mode='r'.
        self.mutex = RLock()

        # open zip file
        self.zf = zipfile.ZipFile(path, mode=mode, compression=compression,
                                  allowZip64=allowZip64)

Then you can create the create the zip file in memory:

zip_buffer = io.BytesIO()

store = MyZipStore(zip_buffer)

ds.to_zarr(store)

You'll notice that the zip_buffer contains a valid zip file:

zip_buffer.read(10)
b'PK\x03\x04\x14\x00\x00\x00\x00\x00'

(PK\x03\x04 is the Zip file magic number)

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for that, my main objective is to write and return a zarr dataset in memory without having to write to disk. Would I be able to do the following: z_file = ds.to_zarr(Zarr.MemoryStore()) then open(shutil.make_archive('file_name', 'zip', z_file), 'rb').read() When I try that, it seems that I am not getting bytes returned
I've updated the answer to address the completely-in-memory use case. This should be supported in Zarr directly, and as it turns out, there is already an open issue for this: github.com/zarr-developers/zarr-python/issues/1018

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.