2

I'd like to use a compact & fast large bitfield in Python, ideally with no or few dependencies other than numpy. The operations I'd like would be roughly equivalent to:

bits = new_bitfield(3000000)  # 3 million bits
bits.set_bit(n, 1)
bits.set_bit(n, 0)
bits.get_bit(n)

I'd like the underlying storage for bits to be very condensed/low-overhead. (Ideally, that bits object will only take a hairs'-breadth over 366.21 Kibibytes.)

I'd like the gets and sets to be very fast, with a minimum of type-checking/type-coercing overhead - perhaps even especially fast when using Cython- or Numba- specific code (or their respective inlining options).

What's the best way to approach C speed/compactness, while retaining as Pythonic of an outward appearance as is possible?

2
  • 1
    "What's the best way to approach C speed/compactness, while retaining as Pythonic of an outward appearance as is possible?" Write a C-extension Commented Jul 8, 2020 at 23:30
  • 1
    It depends what you want to do with them. eg., implementing it in Numba should be quite easy (just syntax changes to that stackoverflow.com/a/30590727/4045774). But this only really makes sense if you call this from a function which is also compiled. Commented Jul 9, 2020 at 9:43

1 Answer 1

0

It should not be too hard to roll it out by yourself, another option is to use existing implementations. For better or worse, std::vector<bool> is exactly what you want: it uses exactly 1 bit per value (thus bool-template parameter is somewhat misleading, as a bool is at least 1 byte long).

Using Cython it could look something like this (it is compiled as a c++-extension):

%%cython -+
from libcpp.vector cimport vector
from libcpp cimport bool

cdef class Bitset:
    cdef vector[bool] bset;
    
    def __cinit__(self, size_t size):
        self.bset.resize(size, False);
        
    cpdef void set_bit(self, size_t pos, bint val) except *:        
        # self.bset[pos] = val would not check out of range
        # self.bset.at(pos) = val doesn't work with Cython
        if pos < self.bset.size():
            self.bset[pos] = val;
        else:
            raise IndexError("out of range access")    
            
    cpdef bint get_bit(self, size_t pos):
        return self.bset.at(pos)

Which can be used as

mybitset = Bitset(10)
mybitset.set_bit(2, True)
mybitset.get_bit(1), mybitset.get_bit(2) #returns (False, True)

Also

mybitset.set_bit(11, True) #throws
mybitset.get_bit(12, True) #throws

throw and don't end in undefined behavior.

Obviously, to keep the overhead at minimum, the code which uses Bitset-cdef-class should be written in Cython as well thus, cdef-part of the interface can be used, without the need of conversion to Python-object, which is needed for the Python-part of the interface (which is demonstrated above).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.