1

I want to merge 70 netCDF files into one. For that I use the .to_netcdf() function of xarray:

    ds = xarray.open_mfdataset('*.nc')
    ds.to_netcdf('SST_2021-10_timeseries.nc')

My problem is that my jupyter notebook always hangs up because the number of files is too high. Is there a more efficient way to merge the files?

1
  • 1
    Not an answer with netCDF, but another option would be to use a zarr store rather than netCDF (e.g. ds.to_zarr), as zarr has chunking support and parallel writes. Commented Dec 2, 2021 at 17:11

1 Answer 1

1

An alternative would be to use nctoolkit. Commands would be as follows:

import nctoolkit as nc
ds = nc.open_data('*.nc')
ds.merge("time")
ds.to_nc('SST_2021-10_timeseries.nc')

Or you could do it on the command line with CDO:

cdo -mergetime *.nc SST_2021-10_timeseries.nc

Those options should get around any RAM issues.

Sign up to request clarification or add additional context in comments.

5 Comments

Seems like I can't open all datasets with the asterix in this line: ds = nc.open_data('*.nc') Will get then the message that there is no dataset with the name *.nc
Which version of nctoolkit have you installed? Conda has a tendency to install an old version, so the version you have may not accept wild cards
nctoolkit-0.3.9, I think that's the latest, right?
OK. That should work. But try: ds = nc.open_data(nc.create_ensemble(".", recursive = False)). What OS are you using?
though the string "*.nc" should replaced be glob.glob("'.nc') by nctoolkit (line 436: github.com/pmlmodelling/nctoolkit/blob/master/nctoolkit/api.py). So I'm puzzled by what's going on

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.