I have two raster image geotiffs, one of (275521, 329643) dims (let's call this the yearly image) and the other of (73583, 152367) dims (the monthly image). The cell sizes of both rasters are already the same. Using Python, I would like to do the following:
- Clip the yearly image so that it is the same as the monthly image extents.
- Match the extents of the yearly image to monthly image extents.
- Create a binary mask from the yearly image file, which is a categorical dataset (vals 0/1/2/3) such that all cells containing 3 return a value of 1, and all other cells are 0.
- Multiply the monthly image file with this new mask layer to filter out the cells that align with the yearly.data == 3 cells.
I have been working with rasterio and rioxarray in Python on Windows and would like to learn how to do this in Python because I will be looping through 8 yearly image files and for each of those years, 12 monthly image files.
Predictably, I have been running into memory errors, specifically when i try to create the mask from the yearly image,
MemoryError: Unable to allocate 10.4 GiB for an array with shape (1, 73583, 152367) and data type uint8
I understand that I should try and work with some kind of multiprocessing tool and have been trying to implement what I need using Dask but honestly have no idea what I'm doing or how to begin processing these arrays (i.e., creating a massive array [the yearly mask], multiplying this massive array with another massive array [monthly image] to create yet another massive array [masked monthly data]). Would appreciate all and any kind of help or advice, code examples, tutorials? many thanks.
Not sure how to provide the data but if it helps they are image files downloaded from google earth engine. Here's some sample code:
import rasterio as rio
import xarray as xr
import rioxarray
import numpy as np
years = np.arange(2012, 2020)
monthly_file_paths = ["\paths\to\monthly\image\tifs"]
yearly_file_paths = ["\paths\to\yearly\image\tifs"]
monthly_sample = xr.open_rasterio(monthly_file_paths[0], chunks={"x":256, "y":256})
for yearly_file in yearly_file_paths:
yearly = xr.open_rasterio(yearly_file, chunks={"x":256, "y":256}) # !!!: Previously encountered memory error here before including the "chunks" bit
yearly = yearly.rio.set_nodata(0)
# crop yearly layer to monthly bounding box
yearly = yearly.rio.clip_box(minx=monthly_sample.x.min().item(),
miny=monthly_sample.y.min().item(),
maxx=monthly_sample.x.max().item(),
maxy=monthly_sample.y.max().item())
# reproject and resample yearly to match monthly extents and crs
yearly = yearly.rio.reproject_match(monthly_sample )
yearly = yearly.assign_coords({
"x": monthly_sample.x,
"y": monthly_sample.y})
# create mask.
yearly.data = xr.where(yearly==3, 1, 0) # !!!: Error occurs here
for month_file in monthly_file_paths :
_monthly = xr.open_rasterio(month_file)
_monthly.data = xr.where(_monthly==2, 1, 0)
yearly_monthly = yearly * _monthly
yearly_monthly = yearly_monthly.rio.update_attrs(_monthly.attrs)
yearly_monthly.rio.to_raster(f'{month_file}.tif', dtype=np.uint8,
compress='deflate')```