1

I am trying to perform mean scaling on a bunch of images (>40k). When I read in the images, of size (3,256,256), into an np array, memory usage is at %40 (out of 60 GB, checked using htop). However, when I run arr.std(), the program crashes and gives a MemoryError, even though usage is still at %40.

Any thoughts on what might be wrong?

1
  • 2
    Does arr.std() try to allocate a single big chunk of memory? If so, and the requested size is larger than the remaining memory, the request would fail and the memory usage would remain unchanged... Commented Jun 6, 2016 at 18:57

1 Answer 1

1

Are you completly sure that each cell of your arrays takes only 1 Byte, because maybe by default it allocates 8 Bytes for cell.

I created small array 3 x 3 and it occupies 72 bytes.

import numpy as np a = np.array(np.mat('1, 2, 3; 4, 5, 6; 7, 8, 9')) print(a.nbytes) # Use this .nbytes instead of sys.getsizeof

256 x 256 x 3 x 8 Bytes = 1572864 B = 1.5 MB

1.5 MB x 40,000 = 60000 MB \approx 58.6 GB

And you said that you have at least 40 thousands, so if you have more that that and std is using some memory to flatten the array (see http://docs.scipy.org/doc/numpy-1.9.2/reference/generated/numpy.std.html and you would land here https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py) you would run out of memory.

The solution is very simple: Enforce the byte type int8 or another from here: http://docs.scipy.org/doc/numpy-1.9.2/user/basics.types.html

a = np.array(np.mat('1, 2, 3, ; 4, 5, 6; 7, 8, 9'), dtype=np.int8) print(a.nbytes) # Only 9 Bytes


to check available memory try pythonic way (instead of htop):

import psutil m = psutil.virtual_memory() print(m.available)


P.S. Remember that array.nbytes shows the amount of memory consumed only by array elements without some auxiliary bytes for array maintenance.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the response. I double checked and each cell is taking 1 byte. I was able to get 20,000 images loaded (total image array bytes = 5.5GB) and when using arr.std() it used up %93 of total memory. I suppose this is as far as I can go, but it would be nice to be able to load more.
@Lamikins Yes, you have assured that cells in each of your image are taking 1B, but have you noticed how std is computed - routine numpy.core._methods._var is using float64 aka f8 for interim results. Simply try to compute std for only one image and capture the memory usage with psutil before and after std. Then you could either use smaller dtype parameter or rewrite std for your own. In std they use sqrt(mean(abs(x - x.mean()) ** 2)), but you can go with other in place algorithm from en.wikipedia.org/wiki/Algorithms_for_calculating_variance
@Lamikins Your own version should compute variance in situ and then take square root of it, but remember to use numerically stable algorithm like the Compensated variant.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.