Fast Bitwise Get Column in Python

Question

Is there an efficient way to get an array of boolean values that are in the n-th position in bitwise array in Python?

Create numpy array with values 0 or 1:

import numpy as np

array = np.array(
    [
     [1, 0, 1],   
     [1, 1, 1], 
     [0, 0, 1],    
    ]
)

Compress size by np.packbits:

pack_array = np.packbits(array, axis=1)

Expected result - some function that could get all values from n-th column from bitwise array. For example if I would like the second column I would like to get (the same as I would call array[:,1]):

array([0, 1, 0])

I have tried numba with the following function. It returns right results but it is very slow:

import numpy as np
from numba import njit

@njit(nopython=True, fastmath=True)
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=np.int32)
    for i in range(n):
        res[i] = bool(packed[i, j//8] & (128>>(j%8)))
    return res

How to test it?

import numpy as np
import time
from numba import njit

array = np.random.choice(a=[False, True], size=(100000000,15))

pack_array = np.packbits(array, axis=1)

start = time.time()
array[:,10]
print('np array')
print(time.time()-start)

@njit(nopython=True, fastmath=True)
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=np.int32)
    for i in range(n):
        res[i] = bool(packed[i, j//8] & (128>>(j%8)))
    return res

# To initialize
getVector(pack_array, 10)

start = time.time()
getVector(pack_array, 10)
print('getVector')
print(time.time()-start)

It returns:

np array
0.00010132789611816406
getVector
0.15648770332336426

You can compute j//8 and 128>>(j%8) one time outside the loop, creating res as np.empty (using dtype=np.bool?). But these are only micro-optimizations, which might already have been done by the compiler. — Michael Szczesny
– Michael Szczesny, Commented Nov 20, 2022 at 21:18
Aside from the modulo operator, I can't imagine this implementation being computationally very slow and not memory bound. — Michael Szczesny
– Michael Szczesny, Commented Nov 20, 2022 at 21:34
The numpy approach is O(1), you can't use that as a baseline. It just returns a view with adjusted strides without any computations. The timing result should be dominated by the print call. — Michael Szczesny
– Michael Szczesny, Commented Nov 20, 2022 at 21:57
@MichaelSzczesny Is it possible to use strides to index within a byte? From reading the numpy source code it looks like packbits is just a loop over the array. — Nick ODell
– Nick ODell, Commented Nov 20, 2022 at 22:10
Surprisingly LLVM doesn't optimize the obvious constants inside the loop (on my machine). ~3.5x faster after moving them outside the loop. — Michael Szczesny
– Michael Szczesny, Commented Nov 20, 2022 at 22:16

Rafnus · Accepted Answer · 2022-12-08 10:19:53Z

2

Besides some micro-optimisations, I dont believe that there is much that can be optimised here. There are also a few small mistakes in your code:

@njit(nopython=True) is saying the same thing twice (the n in njit already stands for nopython mode.) simply @njit or @jit(nopython=True) should be used
fastMath is for "cutting corners" when doing floating point math, since we are only working with integers and booleans, it can be safely removed because it does nothing for us here.

My updated code (seeing a meagre 40% perfomance increase on my machine):

import numba as nb
import numpy as np

np.random.seed(0)
array = np.random.choice(a=[False, True], size=(10000000,15))

pack_array = np.packbits(array, axis=1)

@nb.njit(locals={'res': nb.boolean[:]})
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=nb.boolean)
    byte = j//8
    bit = 128>>(j%8)
    for i in range(n):
        res[i] = bool(packed[i, byte] & bit)
    return res

getVector(pack_array, 10)

In your answer, "res" is a list of 32 bit integers, by giving np.zeros() the numba (NOT numpy) boolean datatype, we can swap it to the more efficient booleans. This is where most of the perfomance improvement comes from. On my machine putting j_mod and j_flr outside of the loop had no noticable effect. But it did have an effect for the commenter @Michael Szczesny, so it might help you aswell.

I would not try to use strides, which @Nick ODell is suggesting, because they can be quite dangerous if used incorrectly. (See the numpy documentation).

edit: I have made a few small changes that were suggested by Michael. (Thanks)

edited Dec 8, 2022 at 10:19

answered Nov 21, 2022 at 10:13

Rafnus

3122 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Michael Szczesny Over a year ago

Colab Notebook with the optimized implementation I used for my benchmark. Different results are to be expected for different hardware.

Michael Szczesny Over a year ago

Pure numpy (pack_array[:, j // 8] & 128>>(j%8)).astype(bool) is ~2x faster than the original implementation.

Jérôme Richard Over a year ago

I guess function2 should be called getVector, isn't it ?

Rafnus Over a year ago

@Jérôme Richard you are right, I fixed it now

Collectives™ on Stack Overflow

Fast Bitwise Get Column in Python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related