1

I am trying to find the fastest possible way to convert a binary string to an array of integer 0 and 1. I am currently using python 3.8, and have the following two functions to obtain such array:

import numpy as np
from typing import Literal, Sequence
def string_to_array(Bin_String):
    Bin_array=[int(Bin_String[i],2) for i in range(len(Bin_String))]
    return Bin_array

def string_to_array_LtSq(string: Sequence[Literal['0', '1']]) -> np.ndarray:
    return np.array([int(c) for c in string])

For a string of length 1024, string_to_array_LtSq function takes 20 micro-seconds less than the other (average 370 micro-seconds) though I don't understand why it is faster since both are using int function.

But this is an important part of the code, so is there a faster way in python?

Also, is it possible to do faster in any other language (for example c)? I might switch to that language.

Thanks.

Related Post:

  1. Convert Bitstring (String of 1 and 0s) to numpy array
8
  • 1
    list(map(int, string)) Commented Jul 3, 2022 at 17:05
  • Define "array". Commented Jul 3, 2022 at 17:06
  • 1
    The fact that you already accept two different types (Python list and NumPy array) as "array". What else might you accept? It's unclear. Commented Jul 3, 2022 at 17:35
  • 1
    "act like array" is equally unclear. But the rest of that comment helps a bit. Commented Jul 3, 2022 at 17:47
  • 1
    So bytes would also be acceptable, i.e., I don't need to use bytearray? Commented Jul 3, 2022 at 17:59

2 Answers 2

3

Try:

s = '0011'

print(np.frombuffer(s.encode("ascii"), dtype="u1") - 48)

Benchmark:

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


def f2():
    return np.array([int(c) for c in s])


def f3():
    return list(map(int, s))


def f4():
    return [int(c) for c in s]


t1 = timeit(f1, number=1_000)
t2 = timeit(f2, number=1_000)
t3 = timeit(f3, number=1_000)
t4 = timeit(f4, number=1_000)

print(t1)
print(t2)
print(t3)
print(t4)

Prints:

0.00223864201689139
0.18963027599966154
0.10751374304527417
0.13433810899732634

EDIT: added functions which creates only python list (instead of np.array)

Sign up to request clarification or add additional context in comments.

4 Comments

I am using perf_counter, is there any issue with perf_counter?
@Michael This uses perf_counter as well.
@Michael I've added functions which creates python lists for comparison.
You could create a list faster with f1().tolist() (although it seems still slower than my way via bytes).
2

bytearray appears to be even faster than Andrej's NumPy solution. And bytes can be used for a fast list solution. Times with 1024 bits (only showing the first 5):

f1   2.7 μs  [1 0 1 1 1]
f2   2.0 μs  bytearray(b'\x01\x00\x01\x01\x01')
f3   7.6 μs  [1, 0, 1, 1, 1]

Code based on Andrej's (Try it online!):

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


table = bytearray.maketrans(b'01', b'\x00\x01')

def f2():
    return bytearray(s, "ascii").translate(table)


def f3():
    return [*s.encode().translate(table)]


for _ in range(3):
    for f in f1, f2, f3:
        t = timeit(f, number=1_000)
        t = '%5.1f μs ' % (t * 1e3)
        print(f.__name__, t, f()[:5])
    print()

5 Comments

Nice solution. If OP doesn't need list or np.array this should be the fastest. On my computer Python 3.9.7/AMD 3700X yours solution is even faster: f1 2.2 μs f2 0.9 μs f3 10.6 μs
@AndrejKesely Yeah, it's a bit odd to do this without really knowing what they need. Especially what they're going to do with it. The different types will also lead to different speeds in their subsequent usage. I feel like that should be taken into consideration as well .
@AndrejKesely I guess we have everything in this post! but now I am thinking what will be the problem, I need to choose the solution which will not cause any problem in the future, by the way Kelly table should be inside the function..that takes some time, but still yours is faster.
@Michael I disagree. The table can and should be created once and (re)used many times. It should not be in the function.
@AndrejKesely I mean, as an extreme, we could write a class Array whose constructor simply stores the string and whose __getitem__ does the conversion from character to int on the fly. Then I could brag about converting to array in 0.3 μs, but the later usage would suffer. So just benchmarking the conversion to "array" seems rather meaningless.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.