0

I have a huge array (of arrays) of integers in the range 0-255. Since I know the range of the integers, hence I want to optimize the space occupied by them by storing each integer within a single byte.

In C++, I would simply use char to store the integers, but I am not able to find a way out in Python.

>>> a = 10
>>> sys.getsizeof(a)
24
>>> b = chr(a)
>>> sys.getsizeof(b)
38
>>> c = bytearray(1)
>>> c[0] = b
>>> c[0]
10
>>> sys.getsizeof(c[0])
24
>>> c
bytearray(b'\n')
>>> sys.getsizeof(c)
50

I have searched for data types available in Python, but I am not able to get any data type which can give me sys.getsizeof() equal to 1. I want to know whether there exists a spatially optimal way of storing such integers.

1
  • There are compact arrays in the standard array module. But if you also want fast operations on your data, you might as well use Numpy. Commented Mar 2, 2016 at 19:37

4 Answers 4

4

sys.getsizeof(c[0]) doesn't report the actual amount of memory used to store the first element of c. Accessing c[0] makes Python construct an integer object (or fetch one from the small integer cache) to represent the value, but the bytearray does store the value as one byte.

This is more obvious with a larger bytearray:

>>> sys.getsizeof(bytearray([5]*1000))
1168

You can see that this bytearray couldn't possibly be using more than 1 byte per element, or it would be at least 2000 bytes in size. (The excess space is due to overallocation to accommodate additional elements, and some object overhead.)

Sign up to request clarification or add additional context in comments.

Comments

2

You can use numpy arrays for that. E.g.:

import numpy as np

byte_array = np.empty(10, np.uint8) # an array of 10 uninitialized bytes

See other numpy array constructors for more details.

2 Comments

That's not an array of 10 bytes. It has a single (1 byte) element.
@Alex Mistyped the constructor function.
1

If you are dealing with huge arrays then you will probably be best off using numpy which includes a lot of array tools for you.

There is some overhead but it is minimal:

import numpy as np
import sys

a = np.array([0]*10000, np.uint8)    
len(a)
# 10000
sys.getsizeof(a)
# 10048
sys.getsizeof(a[0])
# 13
a = np.array([0]*1000000, np.uint8)
sys.getsizeof(a)
# 1000048

Comments

1

There is a bytes class for the purpose of storing a packed sequence of bytes. I don't think there's an easy way of storing just a single number using one byte of memory.

Documentation for bytes

>>> bytes.fromhex('2Ef0 F1f2  ')
b'.\xf0\xf1\xf2'

>>> sys.getsizeof(bytes.fromhex(''))
33
>>> sys.getsizeof(bytes.fromhex('dead'))
35
>>> sys.getsizeof(bytes.fromhex('deadbeef'))
37

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.