3

I have a string that contains printable and unprintable characters, for instance:

'\xe8\x00\x00\x00\x00\x60\xfc\xe8\x89\x00\x00\x00\x60\x89'

What's the most "pythonesque" way to convert this to a bytes object in Python 3, i.e.:

b'\xe8\x00\x00\x00\x00`\xfc\xe8\x89\x00\x00\x00`\x89'
1
  • All characters are within the range 0-255? Commented Feb 24, 2014 at 22:12

1 Answer 1

4

If all your codepoints are within the range U+0000 to U+00FF, you can encode to Latin-1:

inputstring.encode('latin1')

as the first 255 codepoints of Unicode map one-to-one to bytes in the Latin-1 standard.

This is by far and away the fastest method, but won't work for any characters in the input string outside that range.

Basically, if you got Unicode that contains 'bytes' that should not have been decoded, encode to Latin-1 to get the original bytes again.

Demo:

>>> '\xe8\x00\x00\x00\x00\x60\xfc\xe8\x89\x00\x00\x00\x60\x89'.encode('latin1')
b'\xe8\x00\x00\x00\x00`\xfc\xe8\x89\x00\x00\x00`\x89'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.