Efficient way to split a bytes array then convert it to string in Python

Question

I have a numpy bytes array containing characters, followed by b'', followed by others characters (including weird characters which raise Unicode errors when decoding):

bytes = numpy.array([b'f', b'o', b'o', b'', b'b', b'a', b'd', b'\xfe', b'\x95', b'', b'\x80', b'\x04', b'\x08' b'\x06'])

I want to get everything before the first b''.

Currently my code is:

txt = []
for c in bytes:
    if c != b'':
        txt.append(c.decode('utf-8'))
    else:
        break
txt = ''.join(txt)

I suppose there is a more efficient and Pythonic way to do that.

By no means a duplicate but I think you are looking for something like this stackoverflow.com/q/432112/2988730 — Mad Physicist
– Mad Physicist, Commented Aug 30, 2016 at 13:03

Dimitris Fasarakis Hilliard · Accepted Answer · 2016-08-30 11:10:30Z

4

I like your way, it is explicit, the for loop is understandable by all and it isn't all that slow compared to other approaches.

Some suggestions I'd make would be to change your condition from if c != b'' to if c since a non-empty byte object will be truthy and, *don't name your list bytes, you mask the built-in! Name it bt or something similar :-)

Other options include itertools.takewhile which will grab elements from an iterable as long as a predicate holds; your operation would look like:

"".join(s.decode('utf-8') for s in takewhile(bool, bt))

This is slightly slower but is more compact, if you're a one-liner lover this might appeal to you.

Slightly faster and also compact is using index along with a slice:

"".join(b.decode('utf-8') for b in bt[:bt.index(b'')])

While compact it also suffers from readability.

In short, I'd go with the for loop since readability counts as very pythonic in my eyes.

answered Aug 30, 2016 at 11:10

Dimitris Fasarakis Hilliard

162k35 gold badges282 silver badges265 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

roipoussiere Over a year ago

Thanks for this advices! Oh in fact the byte array was a numpy array. I like your second solution, but I benchmarked the these 3 solutions (with ba[:np.where(ba == b'')[0][0]] instead of ba[:ba.index(b'')]) and it appears that the for loop solution is faster, so I choosed it.

Dimitris Fasarakis Hilliard Over a year ago

@user2914540 oh I was unaware that it was a numpy array, maybe add the numpy tag and specify that bytes is a numpy array? There might be more efficient ways to do this in numpy.

roipoussiere Over a year ago

done. Sorry, this array comes from an external library (netcdf4py) and I discovered it was a numpy array by trying to do ab.index().

Collectives™ on Stack Overflow

Efficient way to split a bytes array then convert it to string in Python

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related