10

I want to create a 2D Binary (Bit) Array in Python in a space and time efficient way as my 2D bitarray would be around 1 Million(rows) * 50000(columns of 0's or 1's) and also I would be performing bitwise operations over these huge elements. My array would look something like:

0 1 0 1
1 1 1 0
1 0 0 0 
...

In C++ most efficient way (space) for me would be to create a kind of array of integers where each element represents 32 bits and then I can use the shift operators coupled with bit-wise operators to carry operations.

Now I know that there is a bitarray module in python. But I am unable to create a 2D structure using list of bit-arrays. How can I do this?

Another way I know in C++ would be to create a map something like map<id, vector<int> > where then I can manipulate the vector as I have mentioned above. Should I use the dictionary equivalent in python?

Even if you suggest me some way out to use bit array for this task it will be great If I can get to know whether I can have multiple threads operate on a splice of bitarray so that I can make it multithreaded. Thanks for the help!!

EDIT:

I can even go on creating my own data structure for this if the need be. However just wanted to check before reinventing the wheel.

6
  • 2
    Is it a sparse array? Otherwise you are going to need ~6GB to store all those bits Commented May 1, 2012 at 10:36
  • 1
    bitwise is redundant when the datatype is bits :). Perhaps you can just use a set and normal set operations. Membership of the set can represent True Commented May 1, 2012 at 10:45
  • 1
    The bitwise operations apply only when you are stuck on the idea that you need to use 32bit ints (or similar) to back the bits into. Commented May 1, 2012 at 10:56
  • @yavar you don't have the number 32 you have a bit in the 6th least significant column. the fact that this represents the number 32 when interpreted as an integer is irrelevant. You will use massively more memory using ints than a sparse set of booleans if the array is sparse Commented May 1, 2012 at 11:02
  • The answer given by AIX does 100% answer my question however the one given by gnibbler does solve the problem in a space/time efficient way. So I toggle the selected answer but I wish I could have selected both the answers as my appropriate answer :) Commented May 1, 2012 at 12:47

3 Answers 3

5

As per my comment, you may be able to use sets

0 1 0 1
1 1 1 0
1 0 0 0 

can be represented as

set([(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)])

or

{(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)}

an AND is equivalent to an intersection of 2 sets
OR is the union of the 2 sets

Sign up to request clarification or add additional context in comments.

2 Comments

It would be (3,0) instead of (4,0).
@Yavar, they represent the (x,y) of each 1. y is counting down from the top in this case, but you can switch them round whichever way suits as long as you are consistent
5

How about the following:

In [11]: from bitarray import bitarray

In [12]: arr = [bitarray(50) for i in xrange(10)]

This creates a 10x50 bit array, which you can access as follows:

In [15]: arr[0][1] = True

In [16]: arr[0][1]
Out[16]: True

Bear in mind that a 1Mx50K array would require about 6GB of memory (and a 64-bit build of Python on a 64-bit OS).

whether I can have multiple threads operate on a splice of bitarray so that I can make it multithreaded

That shouldn't be a problem, with the usual caveats. Bear in mind that due to the GIL you are unlikely to achieve performance improvements through multithreading.

Comments

3

Can you use numpy?

>>> import numpy
>>> A = numpy.zeros((50000, 1000000), dtype=bool)

EDIT: Doesn't seem to be the most space efficient. Uses 50GB (1-byte per bool). Does anyone know if numpy has a way to use packed bools?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.