Reduce Memory Usage when Running Numpy Array Operations

Question

I have a fairly large NumPy array that I need to perform an operation on but when I do so, my ~2GB array requires ~30GB of RAM in order to perform the operation. I've read that NumPy can be fairly clumsy with memory usage but this seems excessive.

Does anyone know of an alternative way to apply these operations to limit the RAM load? Perhaps row-by-row/in place etc.?

Code below (ignore the meaningless calculation, in my code the coefficients vary):

import xarray as xr 
import numpy as np

def optimise(data):

    data_scaled_offset = (((data - 1000) * (1 / 1)) + 1).round(0)
    return data_scaled_offset.astype(np.uint16)

# This could also be float32 but I'm using uint16 here to reduce memory load for demo purposes
ds = np.random.randint(0, 12000, size=(40000,30000), dtype=np.uint16)

ds = optimise(ds) # Results in ~30GB RAM usage

I would try spliting it into multiple lesser arrays, optimise them one by one, and then concatante them. — Misieq
– Misieq, Commented Sep 24, 2019 at 11:06
Part of the reason your mem usage is increasing is because division, even it is by one, promotes your np.uint16 (2 bytes per number) to float64 (8 bytes per number) — Brenlla
– Brenlla, Commented Sep 24, 2019 at 12:31
This example is not really representative as most of it could be cancelled away. What's the real operation you're trying to do? — Nils Werner
– Nils Werner, Commented Sep 24, 2019 at 12:50
This is the real operation, the data_scaled_offset offset line results in a huge RAM usage. Yes the memory reduces when the variables are no longer in scope, but the spike in memory is what I want to reduce. — tda
– tda, Commented Sep 24, 2019 at 12:54
If that's the case, why don't you simplify it to ds - 999 and by doing so get rid of all the intermediate arrays? — Nils Werner
– Nils Werner, Commented Sep 24, 2019 at 12:59

Dev Khadka · Accepted Answer · 2019-09-24 11:31:02Z

3

By default operations like multiplication, addition and many others... you can use numpy.multiply, numpy.add and use out parameter to use existing array for storing result. That will significantly reduce the memory usage. Please see the demo below and translate you code to use those functions instead

arr = np.random.rand(100)
arr2 = np.random.rand(100)

arr3 = np.subtract(arr, 100, out=arr)
arr4 = arr+100
arr5 = np.add(arr, arr2, out=arr2)
arr6 = arr+arr2

print(arr is arr3) # True
print(arr is arr4) # False
print(arr2 is arr5) # True
print(arr2 is arr6) # False

answered Sep 24, 2019 at 11:31

Dev Khadka

5,5415 gold badges23 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tda Over a year ago

This seems to work great, thank you - I'm just trying the other answer too to see which is best performing in terms of time etc.

max9111 · Accepted Answer · 2019-09-24 12:43:47Z

3

You could use eg. Numba or Cython to reduce memory usage. Of course a simple Python loop would also be possible, but very slow.

With allocated output array

import numpy as np
import numba as nb

@nb.njit()
def optimise(data):
    data_scaled_offset=np.empty_like(data)
    # Inversely apply scale and scale and offset for this product
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            data_scaled_offset[i,j] = np.round_((((data[i,j] - 1000) *(1 / 1)) + 1),0)

    return data_scaled_offset

In-Place

@nb.njit()
def optimise_in_place(data):
    # Inversely apply scale and scale and offset for this product
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            data[i,j] = np.round_((((data[i,j] - 1000) *(1 / 1)) + 1),0)

    return data

answered Sep 24, 2019 at 12:43

max9111

6,5171 gold badge19 silver badges39 bronze badges

1 Comment

tda Over a year ago

Thanks for your response - this works fine but the other method appears to be slightly faster and uses less memory from a quick analysis on a test array.

Collectives™ on Stack Overflow

Reduce Memory Usage when Running Numpy Array Operations

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related