2

I am using python to receive a string via UDP. From each character in the string I need to extract the 4 pairs of bits and convert these to integers.

For example, if the first character in the string was "J", this is ASCII 0x4a or 0b01001010. So I would extract the pairs of bits [01, 00, 10, 10], which would be converted to [1, 0, 2, 2].

Speed is my number one priority here, so I am looking for a fast way to accomplish this.

Any help is much appreciated, thank you.

3
  • 1
    Possible duplicate of bits to string python Commented Apr 4, 2019 at 19:55
  • You can use all the basic bit operations as you can in C in numpy. For example, a & 0x3 will give you the first two bits, a & 0xc the next two, etc. Commented Apr 4, 2019 at 20:14
  • @Brad Solomon It's one character in a str. I have edited the question to hopefully make this clearer. Commented Apr 4, 2019 at 20:39

2 Answers 2

4

You can use np.unpackbits

def bitpairs(a):
    bf = np.unpackbits(a)
    return bf[1::2] + (bf[::2]<<1)
    ### or: return bf[1::2] | (bf[::2]<<1) but doesn't seem faster

### small example
bitpairs(np.frombuffer(b'J', 'u1'))
# array([1, 0, 2, 2], dtype=uint8)

### large example
from string import ascii_letters as L
S = np.random.choice(array(list(L), 'S1'), 1000000).view('S1000000').item(0)
### one very long byte string
S[:10], S[999990:]
# (b'fhhgXJltDu', b'AQGTlpytHo')
timeit(lambda: bitpairs(np.frombuffer(S, 'u1')), number=1000)
# 8.226706639004988
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I was looking for. Thank you.
3

You can slice the string and convert to int assuming base 2:

>>> byt = '11100100'
>>> [int(b, 2) for b in (byt[0:2], byt[2:4], byt[4:6], byt[6:8])]
[3, 2, 1, 0]

This assume that byt is always an 8 character str, rather than the int formed through the binary literal b11100100.

More generalized solution might look something like:

>>> def get_int_slices(b: str) -> list:
...     return [int(b[i:i+2], 2) for i in range(0, len(b), 2)]
... 
>>> get_int_slices('1110010011100100111001001110010011100100')
[3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0]

The int(x, 2) calls says, "interpret the input as being in base 2."


*To my knowledge, none of my answers have ever won a speed race against Paul Panzer's, and this one is probably no exception.

2 Comments

[int(b[i:i+2], 2) for i in range(0, len(b), 2)] Shorten update to your list comprehension.
Thanks @ScottBoston, that's much cleaner

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.