0

I am using Python 3.4 32 bits on win 7.

I found that an integer in an numpy array has 4 bytes, but in a list it has 10 bytes.

import numpy as np 
s = 10; 
lt = [None] * s;
cnt = 0 ; 
for i in range(0, s):
    lt[cnt] = i;
    cnt += 1;
lt = [x for x in lt if x is not None];
a = np.array(lt);
print("len(a) is " + str(len(a)) + " size is " + str(sys.getsizeof(a)) \
          + " bytes " + " a.itemsize is " + str(a.itemsize) + " total size is " \
          + str(a.itemsize * len(a))  + " Bytes , len(lt) is " \
          + str(len(lt)) + " size is " + str(sys.getsizeof(lt)) + " Bytes ");  

   len(a) is 10 size is 40 bytes  a.itemsize is 4 total size is 40 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 12 Bytes

Because in a list, each element has to keep a pointer to point to the next element ?

If I assigned a string to the list:

  lt[cnt] = "A";

  len(a) is 10 size is 40 bytes  a.itemsize is 4 total size is 40 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 30 Bytes

So, in array, each element has 4 bytes and in list, it is 30 bytes.

But, if I tried:

    lt[cnt] = "AB";
    len(a) is 10 size is 40 bytes  a.itemsize is 8 total size is 80 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 33 Bytes

In array, each element has 8 bytes but in list, it is 33 bytes.

if I tried :

  lt[cnt] = "csedvserb revrvrrw gvrgrwgervwe grujy oliulfv qdqdqafwg5u u56i78k8 awdwfw";  # 73 characters long

 len(a) is 10 size is 40 bytes  a.itemsize is 292 total size is 2920 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 246 Bytes

In array, each element has 292 bytes (=73 * 4) but in list, it has 246 bytes ?

Any explanation will be appreciated.

1
  • How do you get that first element size? sys.getsizeof(lt[0])? Commented Oct 31, 2016 at 7:11

1 Answer 1

0

The element size in arrays is easy - it's determined by the dtype, and as your code shows can be found with .itemsize. 4bytes is common, such as for np.int32, np.float64. Unicode strings are also allocated 4 bytes per character - though the real unicode uses a variable number of characters.

The per element size for lists (and tuples) is trickier. A list does not contain the elements directly, rather it contains pointers to objects which are stored elsewhere. Your list size records the number of pointers, plus a pad. The pad lets it grow in size (with .append) efficiently. All your lists have the same size, regardless of 'first item' size.

My data:

In [2324]: lt=[None]*10
In [2325]: sys.getsizeof(lt)
Out[2325]: 72
In [2326]: lt=[i for i in range(10)]
In [2327]: sys.getsizeof(lt)
Out[2327]: 96
In [2328]: lt=['A' for i in range(10)]
In [2329]: sys.getsizeof(lt)
Out[2329]: 96
In [2330]: lt=['AB' for i in range(10)]
In [2331]: sys.getsizeof(lt)
Out[2331]: 96
In [2332]: lt=['ABCDEF' for i in range(10)]
In [2333]: sys.getsizeof(lt)
Out[2333]: 96
In [2334]: lt=[None for i in range(10)]
In [2335]: sys.getsizeof(lt)
Out[2335]: 96

and for the corresponding arrays:

In [2344]: lt=[None]*10; a=np.array(lt)
In [2345]: a
Out[2345]: array([None, None, None, None, None, None, None, None, None, None], dtype=object)
In [2346]: a.itemsize
Out[2346]: 4
In [2347]: lt=['AB' for i in range(10)]; a=np.array(lt)
In [2348]: a
Out[2348]: 
array(['AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB'], 
      dtype='<U2')
In [2349]: a.itemsize
Out[2349]: 8

When the list contains None, the array is object dtype, and the elements are all pointers (4 bytes integers).

Sign up to request clarification or add additional context in comments.

1 Comment

32-bit types are 4 bytes each, but 64-bit types (e.g., np.float64) are 8 bytes each.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.