I have a fairly large NumPy array that I need to perform an operation on but when I do so, my ~2GB array requires ~30GB of RAM in order to perform the operation. I've read that NumPy can be fairly clumsy with memory usage but this seems excessive.
Does anyone know of an alternative way to apply these operations to limit the RAM load? Perhaps row-by-row/in place etc.?
Code below (ignore the meaningless calculation, in my code the coefficients vary):
import xarray as xr
import numpy as np
def optimise(data):
data_scaled_offset = (((data - 1000) * (1 / 1)) + 1).round(0)
return data_scaled_offset.astype(np.uint16)
# This could also be float32 but I'm using uint16 here to reduce memory load for demo purposes
ds = np.random.randint(0, 12000, size=(40000,30000), dtype=np.uint16)
ds = optimise(ds) # Results in ~30GB RAM usage
np.uint16(2 bytes per number) tofloat64(8 bytes per number)data_scaled_offsetoffset line results in a huge RAM usage. Yes the memory reduces when the variables are no longer in scope, but the spike in memory is what I want to reduce.ds - 999and by doing so get rid of all the intermediate arrays?