Python, adding a Water-Year time variable in an X-array

Question

I have the following Xarray named 'scatch' with lat long and lev coords eliminated and only the time coord as a dimension. It has several variables. It is now a multivariate daily time-series from 2002 to 2014. I need to add a new variable "water_year", that shows what water-year is that day of the year. It could be by adding another column in the variables by Xarray.assign or by Xarray.resample but I am not sure, and could use some help. Note: "Water Year" starts from Oct 01, and ends on Sep 30 the next year. So water-year-2003 would be 10-01-2002 to 09-30-2003.

See my Xarray here

Like this?

FObersteiner
– FObersteiner

2022-05-17 04:51:04 +00:00
Commented May 17, 2022 at 4:51 — FObersteiner
– FObersteiner, Commented May 17, 2022 at 4:51

Michael Delgado · Accepted Answer · 2022-05-18 16:39:40Z

I'll create a sample dataset with a single variable for this example:

In [2]: scratch = xr.Dataset(
   ...:     {'Baseflow': (('time', ), np.random.random(4018))},
   ...:     coords={'time': pd.date_range('2002-10-01', freq='D', periods=4018)},
   ...: )

In [3]: scratch
Out[3]:
<xarray.Dataset>
Dimensions:   (time: 4018)
Coordinates:
  * time      (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30
Data variables:
    Baseflow  (time) float64 0.7588 0.05129 0.9914 ... 0.7744 0.6581 0.8686

We can build a water_year array using the Datetime Components accessor .dt:

In [4]: water_year = (scratch.time.dt.month >= 10) + scratch.time.dt.year
   ...: water_year
Out[4]:
<xarray.DataArray (time: 4018)>
array([2003, 2003, 2003, ..., 2013, 2013, 2013])
Coordinates:
  * time     (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30

Because water_year is a DataArray indexed by an existing dimension, we can just add it as a coordinate and xarray will understand that it's a non-dimension coordinate. This is important to make sure we don't create a new dimension in our data.

In [7]: scratch.coords['water_year'] = water_year

In [8]: scratch
Out[8]:
<xarray.Dataset>
Dimensions:     (time: 4018)
Coordinates:
  * time        (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30
    water_year  (time) int64 2003 2003 2003 2003 2003 ... 2013 2013 2013 2013
Data variables:
    Baseflow    (time) float64 0.7588 0.05129 0.9914 ... 0.7744 0.6581 0.8686

Because water_year is indexed by time, we still need to select from the arrays using the time dimension, but we can subset the arrays to specific water years:

In [9]: scratch.sel(time=(scratch.water_year == 2010))
Out[9]:
<xarray.Dataset>
Dimensions:     (time: 365)
Coordinates:
  * time        (time) datetime64[ns] 2009-10-01 2009-10-02 ... 2010-09-30
    water_year  (time) int64 2010 2010 2010 2010 2010 ... 2010 2010 2010 2010
Data variables:
    Baseflow    (time) float64 0.441 0.7586 0.01377 ... 0.2656 0.1054 0.6964

Aggregation operations can use non-dimension coordinates directly, so the following works:

In [10]: scratch.groupby('water_year').sum()
Out[10]:
<xarray.Dataset>
Dimensions:     (water_year: 11)
Coordinates:
  * water_year  (water_year) int64 2003 2004 2005 2006 ... 2010 2011 2012 2013
Data variables:
    Baseflow    (water_year) float64 187.6 186.4 184.7 ... 185.2 189.6 192.7

Excellent Michael, that is exactly what I was looking for. Thank you so much for the effort explaining this.

Collectives™ on Stack Overflow

Python, adding a Water-Year time variable in an X-array

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related