Fastest Way to convert a Binary String to Binary Array (Array of 1 and 0)

Question

I am trying to find the fastest possible way to convert a binary string to an array of integer 0 and 1. I am currently using python 3.8, and have the following two functions to obtain such array:

import numpy as np
from typing import Literal, Sequence
def string_to_array(Bin_String):
    Bin_array=[int(Bin_String[i],2) for i in range(len(Bin_String))]
    return Bin_array

def string_to_array_LtSq(string: Sequence[Literal['0', '1']]) -> np.ndarray:
    return np.array([int(c) for c in string])

For a string of length 1024, string_to_array_LtSq function takes 20 micro-seconds less than the other (average 370 micro-seconds) though I don't understand why it is faster since both are using int function.

But this is an important part of the code, so is there a faster way in python?

Also, is it possible to do faster in any other language (for example c)? I might switch to that language.

Thanks.

Convert Bitstring (String of 1 and 0s) to numpy array

The fact that you already accept two different types (Python list and NumPy array) as "array". What else might you accept? It's unclear. — Kelly Bundy
– Kelly Bundy, Commented Jul 3, 2022 at 17:35
"act like array" is equally unclear. But the rest of that comment helps a bit. — Kelly Bundy
– Kelly Bundy, Commented Jul 3, 2022 at 17:47
So bytes would also be acceptable, i.e., I don't need to use bytearray? — Kelly Bundy
– Kelly Bundy, Commented Jul 3, 2022 at 17:59

Andrej Kesely · Accepted Answer · 2022-07-03 17:26:29Z

3

Try:

s = '0011'

print(np.frombuffer(s.encode("ascii"), dtype="u1") - 48)

Benchmark:

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


def f2():
    return np.array([int(c) for c in s])


def f3():
    return list(map(int, s))


def f4():
    return [int(c) for c in s]


t1 = timeit(f1, number=1_000)
t2 = timeit(f2, number=1_000)
t3 = timeit(f3, number=1_000)
t4 = timeit(f4, number=1_000)

print(t1)
print(t2)
print(t3)
print(t4)

Prints:

0.00223864201689139
0.18963027599966154
0.10751374304527417
0.13433810899732634

^{EDIT: added functions which creates only python list (instead of np.array)}

edited Jul 3, 2022 at 17:26

answered Jul 3, 2022 at 17:12

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Michael Over a year ago

I am using perf_counter, is there any issue with perf_counter?

Kelly Bundy Over a year ago

@Michael This uses perf_counter as well.

Andrej Kesely Over a year ago

@Michael I've added functions which creates python lists for comparison.

Kelly Bundy Over a year ago

You could create a list faster with f1().tolist() (although it seems still slower than my way via bytes).

Kelly Bundy · Accepted Answer · 2022-07-03 17:46:26Z

2

bytearray appears to be even faster than Andrej's NumPy solution. And bytes can be used for a fast list solution. Times with 1024 bits (only showing the first 5):

f1   2.7 μs  [1 0 1 1 1]
f2   2.0 μs  bytearray(b'\x01\x00\x01\x01\x01')
f3   7.6 μs  [1, 0, 1, 1, 1]

Code based on Andrej's (Try it online!):

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


table = bytearray.maketrans(b'01', b'\x00\x01')

def f2():
    return bytearray(s, "ascii").translate(table)


def f3():
    return [*s.encode().translate(table)]


for _ in range(3):
    for f in f1, f2, f3:
        t = timeit(f, number=1_000)
        t = '%5.1f μs ' % (t * 1e3)
        print(f.__name__, t, f()[:5])
    print()

answered Jul 3, 2022 at 17:46

Kelly Bundy

28k7 gold badges34 silver badges76 bronze badges

5 Comments

Andrej Kesely Over a year ago

Nice solution. If OP doesn't need list or np.array this should be the fastest. On my computer Python 3.9.7/AMD 3700X yours solution is even faster: f1 2.2 μs f2 0.9 μs f3 10.6 μs

Kelly Bundy Over a year ago

@AndrejKesely Yeah, it's a bit odd to do this without really knowing what they need. Especially what they're going to do with it. The different types will also lead to different speeds in their subsequent usage. I feel like that should be taken into consideration as well .

Michael Over a year ago

@AndrejKesely I guess we have everything in this post! but now I am thinking what will be the problem, I need to choose the solution which will not cause any problem in the future, by the way Kelly table should be inside the function..that takes some time, but still yours is faster.

Kelly Bundy Over a year ago

@Michael I disagree. The table can and should be created once and (re)used many times. It should not be in the function.

Kelly Bundy Over a year ago

@AndrejKesely I mean, as an extreme, we could write a class Array whose constructor simply stores the string and whose __getitem__ does the conversion from character to int on the fly. Then I could brag about converting to array in 0.3 μs, but the later usage would suffer. So just benchmarking the conversion to "array" seems rather meaningless.

Collectives™ on Stack Overflow

Fastest Way to convert a Binary String to Binary Array (Array of 1 and 0)

2 Answers 2

4 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related