5

I am using data from multiple netcdf files (in a folder on my computer). Each file holds data for the entire USA, for a time period of 5 years. Locations are referenced based on the index of an x and y coordinate. I am trying to create a time series for multiple locations(grid cells), compiling the 5 year periods into a 20 year period (this would be combining 4 files). Right now I am able to extract the data from all files for one location and compile this into an array using numpy append. However, I would like to extract the data for multiple locations, placing this into a matrix where the rows are the locations and the columns contain the time series precipitation data. I think I have to create a list or dictionary, but I am not really sure how to allocate the data to the list/dictionary within a loop.

I am new to python and netCDF, so forgive me if this is an easy solution. I have been using this code as a guide, but haven't figured out how to format it for what I'd like to do: Python Reading Multiple NetCDF Rainfall files of variable size

Here is my code:

import glob
from netCDF4 import Dataset
import numpy as np

# Define x & y index for grid cell of interest 
    # Pittsburgh is 37,89
yindex = 37  #first number
xindex = 89  #second number

# Path
path = '/Users/LMC/Research Data/NARCCAP/'  
folder = 'MM5I_ccsm/'

## load data file names    
all_files = glob.glob(path + folder+'*.nc')
all_files.sort()

## initialize np arrays of timeperiods and locations
yindexlist = [yindex,'38','39'] # y indices for all grid cells of interest
xindexlist = [xindex,xindex,xindex] # x indices for all grid cells of interest
ngridcell = len(yindexlist)
ntimestep = 58400  # This is for 4 files of 14600 timesteps

## Initialize np array
timeseries_per_gridcell = np.empty(0)

## START LOOP FOR FILE IMPORT
for timestep, datafile in enumerate(all_files):    
    fh = Dataset(datafile,mode='r')  
    days = fh.variables['time'][:]
    lons = fh.variables['lon'][:]
    lats = fh.variables['lat'][:]
    precip = fh.variables['pr'][:]

    for i in range(1):
        timeseries_per_gridcell = np.append(timeseries_per_gridcell,precip[:,yindexlist[i],xindexlist[i]]*10800)

    fh.close()

print timeseries_per_gridcell     

I put 3 files on dropbox so you could access them, but I am only allowed to post 2 links. Here are they are:

https://www.dropbox.com/s/rso0hce8bq7yi2h/pr_MM5I_ccsm_2041010103.nc?dl=0 https://www.dropbox.com/s/j56undjvv7iph0f/pr_MM5I_ccsm_2046010103.nc?dl=0

3
  • This is the last file: dropbox.com/s/5g576034ark3bnw/pr_MM5I_ccsm_2051010103.nc?dl=0 Commented Jun 19, 2015 at 20:10
  • Check out the MFDataset class of the netCDF4-python package. It automatically combines netCDF files into a single netCDF4.MFDataset object in Python, concatenating them along the time axis. Commented Jun 20, 2015 at 16:56
  • @SpencerHill can you provide an updated link to MFDataset? The link you provided is broken. Also any tips on how to use would be awesome if it's not in your link. Thanks so much. Commented Dec 25, 2019 at 21:29

5 Answers 5

13

Nice start, I would recommend the following to help solve your issues.

First, check out ncrcat to quickly concatenate your individual netCDF files into a single file. I highly recommend downloading NCO for netCDF manipulations, especially in this instance where it will ease your Python coding later on.

Let's say the files are named precip_1.nc, precip_2.nc, precip_3.nc, and precip_4.nc. You could concatenate them along the record dimension to form a new precip_all.nc with a record dimension of length 58400 with

ncrcat precip_1.nc precip_2.nc precip_3.nc precip_4.nc -O precip_all.nc

In Python we now just need to read in that new single file and then extract and store the time series for the desired grid cells. Something like this:

import netCDF4
import numpy as np

yindexlist = [1,2,3]
xindexlist = [4,5,6]
ngridcell = len(xidx)
ntimestep = 58400

# Define an empty 2D array to store time series of precip for a set of grid cells
timeseries_per_grid_cell = np.zeros([ngridcell, ntimestep])

ncfile = netCDF4.Dataset('path/to/file/precip_all.nc', 'r')

# Note that precip is 3D, so need to read in all dimensions
precip = ncfile.variables['precip'][:,:,:]

for i in range(ngridcell):
     timeseries_per_grid_cell[i,:] = precip[:, yindexlist[i], xindexlist[i]]

ncfile.close()

If you have to use Python only, you'll need to keep track of the chunks of time indices that the individual files form to make the full time series. 58400/4 = 14600 time steps per file. So you'll have another loop to read in each individual file and store the corresponding slice of times, i.e. the first file will populate 0-14599, the second 14600-29199, etc.

Sign up to request clarification or add additional context in comments.

2 Comments

Finally figured out the NCO and it is perfect. Code works great. Thank you so much.
Great, you can also choose to check off my answer as the one that solved your problem then.
5

I prefer xarray's approach

ds = xr.open_mfdataset('nc_*.nc', combine = 'by_coord', concat_dim = 'time')

ds.to_netcdf('nc_combined.nc') # Export netcdf file

Comments

4

You can easily merge multiple netCDF files into one using netCDF4 package in Python. See example below:

I have four netCDF files like 1.nc, 2.nc, 3.nc, 4.nc. Using command below all four files will be merge into one dataset.

import netCDF4
from netCDF4 import Dataset

dataset = netCDF4.MFDataset(['1.nc','2.nc','3.nc','4.nc'])

1 Comment

How to export the merged dataset into netcdf file?
2

In parallel to the answer of N1B4, you can also concatenate 4 files along their time dimension using CDO from the command line

cdo mergetime precip1.nc precip2.nc precip3.nc precip4.nc merged_file.nc 

or with wildcards

cdo mergetime precip?.nc merged_file.nc 

and then proceed to read it in as per that answer.

You can add another step from the command line to extract the location of choice by using

cdo remapnn,lon=X/lat=Y merged_file.nc my_location.nc

this picks out the gridcell nearest to your specified lon/lat (X,Y) coordinate, or you can use bilinear interpolation if you prefer:

cdo remapbil,lon=X/lat=Y merged_file.nc my_location.nc 

4 Comments

is there another way to concatenate netcdf file with python without using the nco package?, I was not able to make it work.
I haven't used it myself, but there is a cdo package in python that interfaces cdo, see pypi.org/project/cdo - Otherwise you can call cdo using a system call from the package "os"
Thank you @Adrian Tompkins, I also tried that option and it did not work. I've been digging over the topic and I have tried several options, most of them come from Charlie Zender himself, I don't know what I am doing wrong. That's why I would like to try something else.
but if you are on a linux box and have cdo installed and working then the os.system call has to work as a fall back option...?
2

open_mfdatase has to use DASK library to work. SO, if for some reason you can't use it like I can't then this method is useless.

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.