21

I have a folder with NetCDF files from 2006-2100, in ten year blocks (2011-2020, 2021-2030 etc).

I want to create a new NetCDF file which contains all of these files joined together. So far I have read in the files:

ds = xarray.open_dataset('Path/to/file/20062010.nc')
ds1 = xarray.open_dataset('Path/to/file/20112020.nc')
etc.

Then merged these like this:

dsmerged = xarray.merge([ds,ds1])

This works, but is clunky and there must be a simpler way to automate this process, as I will be doing this for many different folders full of files. Is there a more efficient way to do this?

EDIT:

Trying to join these files using glob:

for filename in glob.glob('path/to/file/.*nc'):
    dsmerged = xarray.merge([filename])

Gives the error:

AttributeError: 'str' object has no attribute 'items'

This is reading only the text of the filename, and not the actual file itself, so it can't merge it. How do I open, store as a variable, then merge without doing it bit by bit?

10
  • How is dsmerged = xarray.merge([xarray.open_dataset(f) for f in glob.glob('path/to/file/.*nc')])? Commented Nov 14, 2017 at 16:29
  • Ok that almost made my computer implode and after un-crashing it said memory error: - this might be due to the size of the files? Perhaps my computer can't handle this? Commented Nov 14, 2017 at 17:00
  • 1
    You have more files than your machine's memory capacity can handle. You can test if the code I provided truly works by shortening the number of files to process as follows: dsmerged = xarray.merge([xarray.open_dataset(f) for f in glob.glob('path/to/file/.*nc')[:2]]). In this case, you are only processing two files. As for your memory issues, I would advise looking at this. Commented Nov 14, 2017 at 17:05
  • I tried it with less files, it works! Thank you. I will try and sort out memory issues as you suggest also. Commented Nov 14, 2017 at 17:07
  • 3
    If you are using xarray.open_mfdataset, you don't need the xarray.merge operation. It's already being handled by xarray.open_mfdataset. Just dsmerged = xarray.open_mfdataset('path/to/file/*.nc') should suffice. Commented Nov 15, 2017 at 14:17

1 Answer 1

43
+50

If you are looking for a clean way to get all your datasets merged together, you can use some form of list comprehension and the xarray.merge function to get it done. The following is an illustration:

ds = xarray.merge([xarray.open_dataset(f) for f in glob.glob('path/to/file/.*nc')])

In response to the out of memory issues you encountered, that is probably because you have more files than the python process can handle. The best fix for that is to use the xarray.open_mfdataset function, which actually uses the library dask under the hood to break the data into smaller chunks to be processed. This is usually more memory efficient and will often allow you bring your data into python. With this function, you do not need a for-loop; you can just pass it a string glob in the form "path/to/my/files/*.nc". The following is equivalent to the previously provided solution, but more memory efficient:

ds = xarray.open_mfdataset('path/to/file/*.nc')

I hope this proves useful.

Sign up to request clarification or add additional context in comments.

1 Comment

This question has been useful to so many people - thanks again! For anyone reading, the open_mfdataset command has been the best solution for me many times over the years. Very helpful!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.