7

I have an array of integers:

[int1, int2, ..., intn]

I want to count how many non-zero bits are in the binary representation of these integers.

For example:

bin(123) -> 0b1111011, there are 6 non-zero bits

Of course I can loop over list of integers, use bin() and count('1') functions, but I'm looking for vectorized way to do it.

2
  • does this answer your question? stackoverflow.com/questions/8871204/… Commented Sep 18, 2020 at 11:56
  • Also, are these uint8? or any int? Commented Sep 18, 2020 at 12:01

4 Answers 4

8

Since numpy, unlike python, has limited integer sizes, you can adapt the bit-twiddling solution proposed by Óscar López to Fast way of counting non-zero bits in positive integer (originally from here) for a portable, fast solution:

def bit_count(arr):
     # Make the values type-agnostic (as long as it's integers)
     t = arr.dtype.type
     mask = t(-1)
     s55 = t(0x5555555555555555 & mask)  # Add more digits for 128bit support
     s33 = t(0x3333333333333333 & mask)
     s0F = t(0x0F0F0F0F0F0F0F0F & mask)
     s01 = t(0x0101010101010101 & mask)

     arr = arr - ((arr >> 1) & s55)
     arr = (arr & s33) + ((arr >> 2) & s33)
     arr = (arr + (arr >> 4)) & s0F
     return (arr * s01) >> (8 * (arr.itemsize - 1))

The first part of the function truncates the quantities 0x5555..., 0x3333..., etc to the integer type that arr actually consists of. The remainder just does a set of bit-twiddling operations.

This function is about 4.5x faster than Ehsan's method and about 60x faster than Valdi Bo's for an array of 100000 elements:

a = np.random.randint(0, 0xFFFFFFFF, size=100000, dtype=np.uint32)
%timeit bit_count(a).sum()
# 846 µs ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit m1(a)
# 3.81 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit m2(a)
# 49.8 ms ± 97.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

m1 and m2 taken as-is from @Ehsan's answer.

Sign up to request clarification or add additional context in comments.

4 Comments

This is the best so far. However, there is no reason to not leverage dedicated CPU instructions (popcnt and vpopcnt, specifically _mm512_popcnt_epi64). It is not implemented in numpy but eventually should (maybe another library has such utility?)
There is a PR for numpy that's been outstanding for a while: github.com/numpy/numpy/pull/21429
@DougT. that PR got merged on Sept 27 2023 and we've had some releases of numpy since then but I still can't find the new ufunc. Any idea where it is?
@Jugdish - I believe its slated for 2.0 which is a major release this spring
7

Assuming your array is a, you can simply do:

np.unpackbits(a.view('uint8')).sum()

example:

a = np.array([123, 44], dtype=np.uint8)
#bin(a) is [0b1111011, 0b101100]
np.unpackbits(a.view('uint8')).sum()
#9

Comparison using benchit:

#@Ehsan's solution
def m1(a):
  return np.unpackbits(a.view('uint8')).sum()

#@Valdi_Bo's solution
def m2(a):
  return sum([ bin(n).count('1') for n in a ])

in_ = [np.random.randint(100000,size=(n)) for n in [10,100,1000,10000,100000]]

m1 is significantly faster.

enter image description here

1 Comment

Could you add my method to your fancy chart? It shows good performance at large sizes.
1

In Forth I used a lookup table to count bits per byte. I was searching to see if there was a numpy function to do the bit count and found this answer.

The 256 byte lookup is faster than the two methods here. A 16 bit (65536 byte lookup ) is faster again. I run out of space for a 32 bit lookup 4.3G :-)

This may be useful to somebody else who finds this answer. The one liners in the other answers are far faster to type.

import numpy as np

def make_n_bit_lookup( bits = 8 ):
    """ Creates a lookup table of bits per byte ( or per 2 bytes for bits = 16).
        returns a count function that uses the table generated.
    """
    try:
        dtype = { 8: np.uint8, 16: np.uint16 }[ bits ]
    except KeyError:
        raise ValueError( 'Parameter bits must be 8, 16.')

    bits_per_byte = np.zeros( 2**bits, dtype = np.uint8 )

    i = 1
    while i < 2**bits:
        bits_per_byte[ i: i*2 ] = bits_per_byte[ : i ] + 1
        i += i
        # Each power of two adds one bit set to the bit count in the 
        # corresponding index from zero.
        #  n       bits   ct  derived from   i
        #  0       0000   0                  
        #  1       0001   1 = bits[0] + 1    1
        #  2       0010   1 = bits[0] + 1    2
        #  3       0011   2 = bits[1] + 1    2
        #  4       0100   1 = bits[0] + 1    4
        #  5       0101   2 = bits[1] + 1    4
        #  6       0110   2 = bits[2] + 1    4
        #  7       0111   3 = bits[3] + 1    4
        #  8       1000   1 = bits[0] + 1    8
        #  9       1001   2 = bits[1] + 1    8
        #  etc...

    def count_bits_set( arr ):
        """ The function using the lookup table. """
        a = arr.view( dtype )
        return bits_per_byte[ a ].sum()

    return count_bits_set

count_bits_set8  = make_n_bit_lookup( 8 )
count_bits_set16 = make_n_bit_lookup( 16 )

# The two original answers as functions.
def text_count( arr ):
    return sum([ bin(n).count('1') for n in arr ])  

def unpack_count( arr ):
    return np.unpackbits(arr.view('uint8')).sum()   


np.random.seed( 1234 )

max64 = 2**64
arr = np.random.randint( max64, size = 100000, dtype = np.uint64 )

count_bits_set8( arr ), count_bits_set16( arr ), text_count( arr ), unpack_count( arr )                                             
# (3199885, 3199885, 3199885, 3199885) - All the same result

%timeit n_bits_set8( arr )                                                                                                          
# 3.63 ms ± 17.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit n_bits_set16( arr )                                                                                                         
# 1.78 ms ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit text_count( arr )                                                                                                           
# 83.9 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit unpack_count( arr )                                                                                                         
# 8.73 ms ± 87.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comments

0

It seems that np.unpackbits runs slower than to compute a sum for bin(n).count('1') on each element of your source array.

The execution times, measured by %timeit, are:

  • 8.35 µs for np.unpackbits(a.view('uint8')).sum(),
  • 2.45 µs for sum([ bin(n).count('1') for n in a ]) (over 3 times faster).

So maybe you should stay with the original concept.

3 Comments

how big is a here?
I used just the same array as in the answer by Ehsan (2 elements). Maybe for a bigger array the relation of execution times will be different.
@Valdi_Bo Please check out the added comparison to my post. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.