Using Dask parallelizing in xarray to select data

Question

I am trying to calculate the hours after sunrise over a data array that has a length of ca. 300k (chunk size ca. 900). The resulted array is a dask.array with no problem using xr.apply_ufunc and astroplan functions. However, it appears to be extremely slow when I use this dask.array for filtering data using xr.where(). Where can I improve it?

Here is my workstream:

from astropy.time import Time
from astroplan import Observer
import astropy.units as u
import xarray as xr
import numpy as np

def cal_sunrise_h(lat, lon, mjd):
    points = Observer(longitude=lon*u.deg, latitude=lat*u.deg, elevation=89*u.km)
    times = Time(mjd, format='mjd')
    sunrise = points.sun_rise_time(times, which="previous")
    hours_after_sunrise = (times-sunrise).sec/3600 
    return hours_after_sunrise

# some fake dataset for reproducing the problem
total_len = 300000
chunk_size = 900
mjd = np.linspace(0, 0.1, total_len) + 5.45559e4
latitude = xr.DataArray(np.linspace(-80, 80, total_len), dims='mjd', coords=[mjd])
longitude = xr.DataArray(np.linspace(-180, 180, total_len), dims='mjd', coords=[mjd])
ds = xr.Dataset({'latitude':latitude, 'longitude':longitude}).chunk({'mjd': chunk_size})

# calculate hours after sunrise
hours_after_sunrise = xr.apply_ufunc(cal_sunrise_h, ds.latitude, ds.longitude, ds.mjd, 
                                     output_dtypes=[float], dask='parallelized') #dask.array
# make a filter
sunrise_filter = (hours_after_sunrise>5) #dask.array

# mask out with filter
ds.where(sunrise_filter, drop=True) #super slow!

Brett Morris · Accepted Answer · 2020-04-08 08:50:02Z

1

astroplan was designed to vectorize over targets observed at single observatories, rather than computing the sun rise/set times at many observatories. The code example you shared will compute the location of the sun total_len independent times, which is a very expensive operation. I'm afraid dask doesn't help with that problem. It would be more efficient to compute the position of the sun at each time using astropy's get_sun function, then compute the sunrise time for each location. You can do this using an algorithm just like astroplan's Observer._horiz_cross method. If you have any trouble implementing this, further support can be found in the #astroplan channel of the astropy slack team.

answered Apr 8, 2020 at 8:50

Brett Morris

8311 gold badge6 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

LILI1234 Over a year ago

Thanks! Can you give me a clearer code example on how to use get_sun and Observer._horiz_cross for this application?

Collectives™ on Stack Overflow

Using Dask parallelizing in xarray to select data

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related