numpy vectorized way to count non-zero bits in array of integers

Question

I have an array of integers:

[int1, int2, ..., intn]

I want to count how many non-zero bits are in the binary representation of these integers.

For example:

bin(123) -> 0b1111011, there are 6 non-zero bits

Of course I can loop over list of integers, use bin() and count('1') functions, but I'm looking for vectorized way to do it.

does this answer your question? stackoverflow.com/questions/8871204/… — Ehsan
– Ehsan, Commented Sep 18, 2020 at 11:56

Mad Physicist · Accepted Answer · 2021-08-26 18:08:09Z

8

Since numpy, unlike python, has limited integer sizes, you can adapt the bit-twiddling solution proposed by Óscar López to Fast way of counting non-zero bits in positive integer (originally from here) for a portable, fast solution:

def bit_count(arr):
     # Make the values type-agnostic (as long as it's integers)
     t = arr.dtype.type
     mask = t(-1)
     s55 = t(0x5555555555555555 & mask)  # Add more digits for 128bit support
     s33 = t(0x3333333333333333 & mask)
     s0F = t(0x0F0F0F0F0F0F0F0F & mask)
     s01 = t(0x0101010101010101 & mask)

     arr = arr - ((arr >> 1) & s55)
     arr = (arr & s33) + ((arr >> 2) & s33)
     arr = (arr + (arr >> 4)) & s0F
     return (arr * s01) >> (8 * (arr.itemsize - 1))

The first part of the function truncates the quantities 0x5555..., 0x3333..., etc to the integer type that arr actually consists of. The remainder just does a set of bit-twiddling operations.

This function is about 4.5x faster than Ehsan's method and about 60x faster than Valdi Bo's for an array of 100000 elements:

a = np.random.randint(0, 0xFFFFFFFF, size=100000, dtype=np.uint32)
%timeit bit_count(a).sum()
# 846 µs ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit m1(a)
# 3.81 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit m2(a)
# 49.8 ms ± 97.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

m1 and m2 taken as-is from @Ehsan's answer.

edited Aug 26, 2021 at 18:08

answered Aug 26, 2021 at 17:40

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Guillaume Over a year ago

This is the best so far. However, there is no reason to not leverage dedicated CPU instructions (popcnt and vpopcnt, specifically _mm512_popcnt_epi64). It is not implemented in numpy but eventually should (maybe another library has such utility?)

Doug T. Over a year ago

There is a PR for numpy that's been outstanding for a while: github.com/numpy/numpy/pull/21429

Jugdish Over a year ago

@DougT. that PR got merged on Sept 27 2023 and we've had some releases of numpy since then but I still can't find the new ufunc. Any idea where it is?

Doug T. Over a year ago

@Jugdish - I believe its slated for 2.0 which is a major release this spring

Ehsan · Accepted Answer · 2020-09-21 18:01:33Z

7

Assuming your array is a, you can simply do:

np.unpackbits(a.view('uint8')).sum()

example:

a = np.array([123, 44], dtype=np.uint8)
#bin(a) is [0b1111011, 0b101100]
np.unpackbits(a.view('uint8')).sum()
#9

Comparison using benchit:

#@Ehsan's solution
def m1(a):
  return np.unpackbits(a.view('uint8')).sum()

#@Valdi_Bo's solution
def m2(a):
  return sum([ bin(n).count('1') for n in a ])

in_ = [np.random.randint(100000,size=(n)) for n in [10,100,1000,10000,100000]]

m1 is significantly faster.

edited Sep 21, 2020 at 18:01

answered Sep 18, 2020 at 12:02

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

1 Comment

Mad Physicist Over a year ago

Could you add my method to your fancy chart? It shows good performance at large sizes.

Tls Chris · Accepted Answer · 2021-02-19 21:20:52Z

In Forth I used a lookup table to count bits per byte. I was searching to see if there was a numpy function to do the bit count and found this answer.

The 256 byte lookup is faster than the two methods here. A 16 bit (65536 byte lookup ) is faster again. I run out of space for a 32 bit lookup 4.3G :-)

This may be useful to somebody else who finds this answer. The one liners in the other answers are far faster to type.

import numpy as np

def make_n_bit_lookup( bits = 8 ):
    """ Creates a lookup table of bits per byte ( or per 2 bytes for bits = 16).
        returns a count function that uses the table generated.
    """
    try:
        dtype = { 8: np.uint8, 16: np.uint16 }[ bits ]
    except KeyError:
        raise ValueError( 'Parameter bits must be 8, 16.')

    bits_per_byte = np.zeros( 2**bits, dtype = np.uint8 )

    i = 1
    while i < 2**bits:
        bits_per_byte[ i: i*2 ] = bits_per_byte[ : i ] + 1
        i += i
        # Each power of two adds one bit set to the bit count in the 
        # corresponding index from zero.
        #  n       bits   ct  derived from   i
        #  0       0000   0                  
        #  1       0001   1 = bits[0] + 1    1
        #  2       0010   1 = bits[0] + 1    2
        #  3       0011   2 = bits[1] + 1    2
        #  4       0100   1 = bits[0] + 1    4
        #  5       0101   2 = bits[1] + 1    4
        #  6       0110   2 = bits[2] + 1    4
        #  7       0111   3 = bits[3] + 1    4
        #  8       1000   1 = bits[0] + 1    8
        #  9       1001   2 = bits[1] + 1    8
        #  etc...

    def count_bits_set( arr ):
        """ The function using the lookup table. """
        a = arr.view( dtype )
        return bits_per_byte[ a ].sum()

    return count_bits_set

count_bits_set8  = make_n_bit_lookup( 8 )
count_bits_set16 = make_n_bit_lookup( 16 )

# The two original answers as functions.
def text_count( arr ):
    return sum([ bin(n).count('1') for n in arr ])  

def unpack_count( arr ):
    return np.unpackbits(arr.view('uint8')).sum()   


np.random.seed( 1234 )

max64 = 2**64
arr = np.random.randint( max64, size = 100000, dtype = np.uint64 )

count_bits_set8( arr ), count_bits_set16( arr ), text_count( arr ), unpack_count( arr )                                             
# (3199885, 3199885, 3199885, 3199885) - All the same result

%timeit n_bits_set8( arr )                                                                                                          
# 3.63 ms ± 17.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit n_bits_set16( arr )                                                                                                         
# 1.78 ms ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit text_count( arr )                                                                                                           
# 83.9 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit unpack_count( arr )                                                                                                         
# 8.73 ms ± 87.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Valdi_Bo · Accepted Answer · 2020-09-18 13:53:24Z

0

It seems that np.unpackbits runs slower than to compute a sum for bin(n).count('1') on each element of your source array.

The execution times, measured by %timeit, are:

8.35 µs for np.unpackbits(a.view('uint8')).sum(),
2.45 µs for sum([ bin(n).count('1') for n in a ]) (over 3 times faster).

So maybe you should stay with the original concept.

answered Sep 18, 2020 at 13:53

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

3 Comments

juanpa.arrivillaga Over a year ago

how big is a here?

Valdi_Bo Over a year ago

I used just the same array as in the answer by Ehsan (2 elements). Maybe for a bigger array the relation of execution times will be different.

Ehsan Over a year ago

@Valdi_Bo Please check out the added comparison to my post. Thank you.

Collectives™ on Stack Overflow

numpy vectorized way to count non-zero bits in array of integers

4 Answers 4

4 Comments

1 Comment

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

1 Comment

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related