Is there an efficient way to get an array of boolean values that are in the n-th position in bitwise array in Python?
- Create numpy array with values 0 or 1:
import numpy as np
array = np.array(
[
[1, 0, 1],
[1, 1, 1],
[0, 0, 1],
]
)
- Compress size by np.packbits:
pack_array = np.packbits(array, axis=1)
- Expected result - some function that could get all values from n-th column from bitwise array. For example if I would like the second column I would like to get (the same as I would call array[:,1]):
array([0, 1, 0])
I have tried numba with the following function. It returns right results but it is very slow:
import numpy as np
from numba import njit
@njit(nopython=True, fastmath=True)
def getVector(packed, j):
n = packed.shape[0]
res = np.zeros(n, dtype=np.int32)
for i in range(n):
res[i] = bool(packed[i, j//8] & (128>>(j%8)))
return res
How to test it?
import numpy as np
import time
from numba import njit
array = np.random.choice(a=[False, True], size=(100000000,15))
pack_array = np.packbits(array, axis=1)
start = time.time()
array[:,10]
print('np array')
print(time.time()-start)
@njit(nopython=True, fastmath=True)
def getVector(packed, j):
n = packed.shape[0]
res = np.zeros(n, dtype=np.int32)
for i in range(n):
res[i] = bool(packed[i, j//8] & (128>>(j%8)))
return res
# To initialize
getVector(pack_array, 10)
start = time.time()
getVector(pack_array, 10)
print('getVector')
print(time.time()-start)
It returns:
np array
0.00010132789611816406
getVector
0.15648770332336426
j//8and128>>(j%8)one time outside the loop, creating res asnp.empty(usingdtype=np.bool?). But these are only micro-optimizations, which might already have been done by the compiler.numpyapproach is O(1), you can't use that as a baseline. It just returns a view with adjusted strides without any computations. The timing result should be dominated by theprintcall.LLVMdoesn't optimize the obvious constants inside the loop (on my machine). ~3.5x faster after moving them outside the loop.