4

I have an array of byte-strings in python3 (it's an audio chunks). I want to make one big byte-string from it. Simple implementation is kind of slow. How to do it better?

chunks = []
while not audio.ends():
  chunks.append( bytes(audio.next_buffer()) )
  do_some_chunk_processing()

all_audio=b''
for ch in chunks:
  all_audio += ch

How to do it faster?

2
  • Do you mean to run the processing on each loop? Commented Mar 4, 2021 at 13:40
  • Are you sure that piecing together the chunks is what's taking the time? Your main while loop looks like it has the potential of being very slow. Commented Mar 4, 2021 at 13:46

3 Answers 3

8

Use bytearray()

from time import time

c = b'\x02\x03\x05\x07' * 500 # test data

# Method-1 with bytes-string

bytes_string = b''

st = time()
for _ in range(10**4):
    bytes_string += c

print("string concat -> took {} sec".format(time()-st))

# Method-2 with bytes-array

bytes_arr = bytearray()

st = time()
for _ in range(10**4):
    bytes_arr.extend(c)
# convert byte_arr to bytes_string via
bytes_string = bytes(bytes_arr)

print("bytearray extend/concat -> took {} sec".format(time()-st))

benchmark in my Win10|Corei7-7th Gen shows:

string concat -> took 67.28 sec
bytearray extend/concat -> took 0.089 sec

the code is pretty self-explanatory. instead of using string+=next_block, use bytearray.extend(next_block). After building bytearray you can use bytes(bytearray) to get the bytes-string.

Sign up to request clarification or add additional context in comments.

2 Comments

Finally a fast solution, I was adding >50,000 chunks of bytes on the fly and I got a 140x speed up by using bytearray.
This was also much faster for me than b''.join() - but I too was adding many chunks on the fly (not finding all chunks and then concatenating them at the end). My script run time went from 393s to 13s, with the bulk of the time shifting from a[i] = a[i] + b to a regex elsewhere in the code.
5

One approach you could try and measure would be to use bytes.join:

all_audio = b''.join(chunks)

The reason this might be faster is that this does a pre-pass over the chunks to find out how big all_audio needs to be, allocates exactly the right size once, then concatenates it in one go.

Reference

Comments

0

One approach is to use fstring

all_audio = b''
for ch in chunks:
        all_audio = f'{all_audio}{ch}'

Which seems to be faster for small strings, according to this comparison.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.