Which are the advantages of byte objects over string objects in Python?

Question

I understand the differences between byte/bytearray and string in Python and how to handle/manipulate/convert these objects but I cannot find real life scenarios/examples where you would prefer to work with bytes instead of strings in the code.

Which are the advantages of byte objects over string objects in Python? and in which real life scenarios should you convert in your code strings into bytes and why?

bytes are for handling raw bytes... str is for handling text. In early programming languages, and indeed in Python 2, strings were just "byte strings". But in a world with multibyte encoded utf-8 strings, it is better to have two different dedicated types. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Jun 7, 2021 at 9:17

PhilipG · Accepted Answer · 2021-06-07 09:53:25Z

7

For all modern computer architectures, a byte consists of 8 bits and thus can encode 256 distinct values.

In the ASCII character encoding, there are only 128 different values, with only a subset of those being printable. With UTF-8 it gets a little more complicated, but you end up in a similar problem, that not all byte sequences are representable as a string. So anytime you have a sequence of bytes that is not representable as a string, you have to use bytes() or bytearray.

One example of when you might need to use bytes, is when working with crypto and pseudo-random sequence generation, where you will often end up with a sequence of bytes that cannot be represented 1-to-1 as a string. This is because you want to work with as large as possible an output space when generating pseudo-random numbers and sequences. See for example secrets.token_bytes from the stdlib.

If you want to represent such a sequence as a string, it's possible to encode it into a sequence of bytes that are all inside the ASCII encoding space, but of course, at the cost of using more bytes. For example, you can encode it as hex characters or in base64. Hex has the advantage that the size of the resulting string is always 2 * n_bytes, while base64 is the most efficient way of encoding bytes into ASCII, i.e. it will use the least amount of extra bytes. Note that the secrets stdlib module also gives you convenience functions that does this conversion for you.

answered Jun 7, 2021 at 9:53

PhilipG

961 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PhilipG Over a year ago

Adding on to this with another example. Whenever you need to read the contents of a non-text file into memory (like a png image), you cannot read this as a string, because the file may contain bytes that are not representable. So in this case you would open the file in binary mode open('file.png', 'rb') and read into a bytes object instead.

Daweo · Accepted Answer · 2021-06-07 09:33:07Z

2

in which real life scenarios should you convert in your code strings into bytes and why?

One example is using some compression algorithm which works on bytes rather than str. Take look at lzma built-in module examples, note that it does work with bytes rather than str. In case of a lot of text this allow more effiecient usage of available memory (i.e. saving same text in smaller space).

answered Jun 7, 2021 at 9:33

Daweo

38.2k3 gold badges17 silver badges32 bronze badges

Collectives™ on Stack Overflow

Which are the advantages of byte objects over string objects in Python?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related