My usual method for extracting the min/max of a variable's data values from a NetCDF file is a magnitude of order slower when switching to the netCDF4 Python module compared to scipy.io.netcdf.
I am working with relatively large ocean model output files (from ROMS) with multiple depth levels over a given map region (Hawaii). When these were in NetCDF-3, I used scipy.io.netcdf.
Now that these files are in NetCDF-4 ("Classic") I can no longer use scipy.io.netcdf and have instead switched over to using the netCDF4 Python module. However, the slowness is a concern and I wondered if there is a more efficient method of extracting a variable's data range (minimum and maximum data values)?
Here was my NetCDF-3 method using scipy:
import scipy.io.netcdf
netcdf = scipy.io.netcdf.netcdf_file(file)
var = netcdf.variables['sea_water_potential_temperature']
min = var.data.min()
max = var.data.max()
Here is my NetCDF-4 method using netCDF4:
import netCDF4
netcdf = netCDF4.Dataset(file)
var = netcdf.variables['sea_water_potential_temperature']
var_array = var.data.flatten()
min = var_array.data.min()
max = var_array.data.max()
The notable difference is that I must first flatten the data array in netCDF4, and this operation apparently slows things down.
Is there a better/faster way?
np.array(var.data).max()to avoid the flattening of thenetCDF Variable. It's hard to say because the structure of the netCDF file is unknown.import numpy as np; np.max(var[:])work?var[:]is. @abudis: In order to work, I had to modify your command tonp.array(var[:].data).max(). @SpencerHill: yes, that works but is equally slow. These two suggestions each take the same amount of time as my example above: still slow. I suppose the scipy optimizations plus netCDF4 changes that @abudis mentions may be the culprit.var[:]is already anumpy.ndarray, so there's no need to call thenp.array()function on it or access itsdataattribute. Justvar[:].max()does the trick. That doesn't help with the computation speed though. A faster method doesn't immediately come to mind, but other more expert users likely know one.ncooperators, called as a subprocess? Something like:ncwa -y max ...