3

Is there an efficient way to get an array of boolean values that are in the n-th position in bitwise array in Python?

  1. Create numpy array with values 0 or 1:
import numpy as np

array = np.array(
    [
     [1, 0, 1],   
     [1, 1, 1], 
     [0, 0, 1],    
    ]
)
  1. Compress size by np.packbits:
pack_array = np.packbits(array, axis=1)
  1. Expected result - some function that could get all values from n-th column from bitwise array. For example if I would like the second column I would like to get (the same as I would call array[:,1]):
array([0, 1, 0])

I have tried numba with the following function. It returns right results but it is very slow:

import numpy as np
from numba import njit

@njit(nopython=True, fastmath=True)
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=np.int32)
    for i in range(n):
        res[i] = bool(packed[i, j//8] & (128>>(j%8)))
    return res

How to test it?

import numpy as np
import time
from numba import njit

array = np.random.choice(a=[False, True], size=(100000000,15))

pack_array = np.packbits(array, axis=1)

start = time.time()
array[:,10]
print('np array')
print(time.time()-start)

@njit(nopython=True, fastmath=True)
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=np.int32)
    for i in range(n):
        res[i] = bool(packed[i, j//8] & (128>>(j%8)))
    return res

# To initialize
getVector(pack_array, 10)

start = time.time()
getVector(pack_array, 10)
print('getVector')
print(time.time()-start)

It returns:

np array
0.00010132789611816406
getVector
0.15648770332336426
5
  • 1
    You can compute j//8 and 128>>(j%8) one time outside the loop, creating res as np.empty (using dtype=np.bool?). But these are only micro-optimizations, which might already have been done by the compiler. Commented Nov 20, 2022 at 21:18
  • Aside from the modulo operator, I can't imagine this implementation being computationally very slow and not memory bound. Commented Nov 20, 2022 at 21:34
  • The numpy approach is O(1), you can't use that as a baseline. It just returns a view with adjusted strides without any computations. The timing result should be dominated by the print call. Commented Nov 20, 2022 at 21:57
  • @MichaelSzczesny Is it possible to use strides to index within a byte? From reading the numpy source code it looks like packbits is just a loop over the array. Commented Nov 20, 2022 at 22:10
  • 1
    Surprisingly LLVM doesn't optimize the obvious constants inside the loop (on my machine). ~3.5x faster after moving them outside the loop. Commented Nov 20, 2022 at 22:16

1 Answer 1

2

Besides some micro-optimisations, I dont believe that there is much that can be optimised here. There are also a few small mistakes in your code:

  • @njit(nopython=True) is saying the same thing twice (the n in njit already stands for nopython mode.) simply @njit or @jit(nopython=True) should be used
  • fastMath is for "cutting corners" when doing floating point math, since we are only working with integers and booleans, it can be safely removed because it does nothing for us here.

My updated code (seeing a meagre 40% perfomance increase on my machine):

import numba as nb
import numpy as np

np.random.seed(0)
array = np.random.choice(a=[False, True], size=(10000000,15))

pack_array = np.packbits(array, axis=1)

@nb.njit(locals={'res': nb.boolean[:]})
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=nb.boolean)
    byte = j//8
    bit = 128>>(j%8)
    for i in range(n):
        res[i] = bool(packed[i, byte] & bit)
    return res

getVector(pack_array, 10)

In your answer, "res" is a list of 32 bit integers, by giving np.zeros() the numba (NOT numpy) boolean datatype, we can swap it to the more efficient booleans. This is where most of the perfomance improvement comes from. On my machine putting j_mod and j_flr outside of the loop had no noticable effect. But it did have an effect for the commenter @Michael Szczesny, so it might help you aswell.

I would not try to use strides, which @Nick ODell is suggesting, because they can be quite dangerous if used incorrectly. (See the numpy documentation).

edit: I have made a few small changes that were suggested by Michael. (Thanks)

Sign up to request clarification or add additional context in comments.

4 Comments

Colab Notebook with the optimized implementation I used for my benchmark. Different results are to be expected for different hardware.
Pure numpy (pack_array[:, j // 8] & 128>>(j%8)).astype(bool) is ~2x faster than the original implementation.
I guess function2 should be called getVector, isn't it ?
@Jérôme Richard you are right, I fixed it now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.