3

I have a very big list of 0's and 1's that are represented as integers - by default - by python, I think: [randint(0, 1) for i in range(50*98)]

And I want to optimize the code so that it utilizes much less memory. The obvious way is to use just 1 bit to represent each of these numbers.

Is it possible to build a list of real binary numbers in python?

Regards, Bruno

EDIT: Thank you all.
From the answers I found out Python doesn't do this by default, so I found this library (that is installed by Macports on OSX so it saves me some trouble) that does bit operations: python-bitstring

6
  • 3
    What would the difference between a "real binary number" and a "normal number" be? Commented Sep 5, 2011 at 11:48
  • 1
    Do you mean a bitset/bitstring? stackoverflow.com/questions/3946086/… Commented Sep 5, 2011 at 11:50
  • A real binary number uses just 1 bit of memory. A normal number, so an integer, utilizes at least 16 bits in Python as far as I know. Commented Sep 5, 2011 at 11:51
  • 2
    You can't adress a single bit of memory anyway, regardless of the language used. If you want an N-bit number, N has to be a multiple of 8, even if you ignore them. Commented Sep 5, 2011 at 11:56
  • Is memory really a problem, or do you just think it is? 50*98 digits is a pretty small amount of memory. Commented Sep 5, 2011 at 12:29

5 Answers 5

4

This uses the bitstring module and constructs a BitArray object from your list:

from bitstring import BitArray
b = BitArray([randint(0, 1) for i in range(50*98)])

Internally this is now stored packed as bytes so will take considerably less memory. You can slice, index, check and set bits etc. with the usual notation and there additional methods such as set, all and any to modify the bits.

To get the data back as a binary string just use b.bin and to get out the byte packed data use b.tobytes() which will pad with zero bits up to a byte boundary.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you a lot, you saved me going trough the documentation of bitstring. This is exactly what I'll use.
2

As delnan already said in a comment, you will not be able to use real binary numbers if you mean bit-for-bit equivalent memory usage.

Integers (or longs) are of course real binary numbers in the meaning that you can address individual bits (using bit-wise operators, but that is easily hidden in a class). Also, long objects can become arbitrarily large, i.e. you can use them to simulate arbitrarily large bitsets. It is not going to be very fast if you do it in Python, but not very difficult either and a good start.

Using your binary generation scheme from above, you can do the following:

reduce(
    lambda (a, p), b: (b << p | a, p + 1), 
    (random.randint(0, 1) for i in range(50*98)),
    (0, 0)
)[0]

Of course, random supports arbitrarily large upper boundaries, so you can do just that:

r = random.randint(0, 2**(50*98))

This is not entirely the same, since the individual binary digits in the are not independent in the same way they are independent when you create each digit for itself. Then again, knowing you pRNGs work, they are not really independent in the other case, either. If this is a concern for you, you probably should not use the random module at all, but a hardware RNG.

Comments

1

It's called a bit vector, or a bitmap. Try e.g. BitVector. If you want to implement it yourself, you need to use a numeric object, not a list, and use bitwise operations to toggle bits, e.g.

 bitmap = 0
 bit = (1 << 24)
 bitmap |= bit  # enable bit
 bitmap &= ~bit # disable bit

1 Comment

Thanks, it was something like that I was searching for. After understanding from the answers that Python doesn't provide that in the standard install, I did find a library (that is installed by Macports on OSX so it saves me some trouble) that does bit operations. python-bitstring
0

It seems you need some kind of bit-set. I'm not sure if this example entirely suits your needs, but it is worth a try.

1 Comment

Thank you. I found a library (that is installed by Macports on OSX so it saves me some trouble) that does bit operations. python-bitstring
0

Perhaps you could look into a lossless compression scheme for your data? Presumably there will be a lot of redundancy in such a list.

4 Comments

That's true, but altough the strings are a bit big, they are not so big that would compensate for the overhead of using compression.
If the bit array is generated randomly, there wouldn't be any redundancy to compress away in any case.
@vhallac: True if I do it like I want and only use bits. But Alex was also right, because if I have to represent the 0's and 1's in the date with integers, then, even being random compression would help. But even so, it looks simpler to use bit operations. Thanks.
@jbbsm Good point. It never occured to me to compress the integers holding bits. The obvious bit packing "compression" would be almost trivial compared to any compression method you can think of - and would perform better. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.