239

I'm using python to analyse some large files and I'm running into memory issues, so I've been using sys.getsizeof() to try and keep track of the usage, but it's behaviour with numpy arrays is bizarre. Here's an example involving a map of albedos that I'm having to open:

>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80

Well the data's still there, but the size of the object, a 3600x7200 pixel map, has gone from ~200 Mb to 80 bytes. I'd like to hope that my memory issues are over and just convert everything to numpy arrays, but I feel that this behaviour, if true, would in some way violate some law of information theory or thermodynamics, or something, so I'm inclined to believe that getsizeof() doesn't work with numpy arrays. Any ideas?

4
  • 10
    From the docs on sys.getsizeof: "Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific. Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to." Commented Aug 2, 2012 at 19:24
  • 1
    This makes getsizeof an unreliable indicator of memory consumption, especially for 3rd party extensions. Commented Aug 2, 2012 at 19:25
  • 20
    Basically, the issue here is that resize is returning a view, not a new array. You're getting the size of the view, not the actual data. Commented Aug 2, 2012 at 19:26
  • 2
    To that end, sys.getsizeof(albedo.base) will give the size of the non-view. Commented Mar 19, 2020 at 16:08

4 Answers 4

358

You can use array.nbytes for numpy arrays, for example:

import numpy as np
from sys import getsizeof
a = [0] * 1024
b = np.array(a)
print(getsizeof(a))
print(b.nbytes)

Output:

8264
8192
Sign up to request clarification or add additional context in comments.

5 Comments

b.__sizeof__() is equivalent to sys.getsizeof(b)
round(getsizeof(a) / 1024 / 1024,2) to get MB
@palash No, sys.getsizeof() does a bit more: "If given, default will be returned if the object does not provide means to retrieve the size. ... getsizeof() calls the object's __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector." In general, magic methods are not equivalent to the relevant functions. For another example, see Is there any case where len(someObj) does not call someObj's __len__ function?
This answer is not correct when the numpy array uses non-default strides.
How is this possible though? is it a virtual size, not really the actual allocated size? np.zeros((1000,1000,1000,1000), dtype=np.complex128).nbytes
25

The field nbytes will give you the size in bytes of all the elements of the array in a numpy.array:

size_in_bytes = my_numpy_array.nbytes

Notice that this does not measures "non-element attributes of the array object" so the actual size in bytes can be a few bytes larger than this.

3 Comments

This answer still creates an array, so I think you mean "without the need to convert from a list to an array". Although it is true that GWW's answer first creates a list and then converts it to an array, that's beside the point, since the OP already has an array... The point is how to get the size of a numpy array, so it's not critical how you got the array in the first place. One could similarly criticize this answer by saying that it reshapes an existing array.
Hello @Moot, thanks for the comment. The question is about how to get the size in bytes of an array. While is true that my snippet first creates an array, it is only for the purpose of having a complete example that can be executed. I will edit my answer to stress this.
This answer is not correct when the numpy array uses non-default strides.
7

To add more flesh to the accepted answer, summarize and provide a more transparent memory example (note tha int8 is one byte):

import numpy as np
from sys import getsizeof
a = np.ones(shape=(1000, 1), dtype='int8')
b = a.T 
a.nbytes, getsizeof(a), b.nbytes, getsizeof(b), getsizeof(b.base)

Will produce the following output:

(1000, 1128, 1000, 128, 1128)
  • a.nbytes = 1000: gives size of the numerical elements: 1000 numerical elements.
  • getsizeof(a) = 1128: gives the size of both numerical elements and the reference machinery.
  • b.nbtyes: the size of the numerical elements independently of the location of memory (is not affected by the view status of b)
  • getsizeof(b) = 128: only calculate the size of the reference machinery, it is afected by the view status..
  • getsizeof(b.base) = 1128: This calculate the size of the numerical elements plus the reference machinery independently of the view status.

Summarizing: If you want to know the size of the numerical elements use array.nbytes and it will work independently of whether there is a view or not. If you, on the other hand, want the size of the numerical elements plus the whole reference machinery you want to use getsizeof(array.base) to get reliable estimates independent of your view status.

Comments

3

In python notebooks I often want to filter out 'dangling' numpy.ndarray's, in particular the ones that are stored in _1, _2, etc that were never really meant to stay alive.

I use this code to get a listing of all of them and their size.

Not sure if locals() or globals() is better here.

import sys
import numpy
from humanize import naturalsize

for size, name in sorted(
    (value.nbytes, name)
    for name, value in locals().items()
    if isinstance(value, numpy.ndarray)):
  print("{:>30}: {:>8}".format(name, naturalsize(size)))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.