How do I convert a string into bytes in python?

Question

In my code, I encode a string with utf-8. I get the output, convert it to a string, and send it to my other program. The other program gets this string, but, when I try to decode the string, it gives me an error, AttributeError: 'str' object has no attribute 'decode'. I need to send the encoded data as a string because my other program receives it in a json. My first program is in python 3, and the other program is in python 2.

# my first program
x = u"宇宙"
x = str(x.encode('utf-8'))


# my other program
text = x.decode('utf-8')
print(text)

What should I do to convert the string received by the second program to bytes so the decode works?

if you have a string decode has already toke place. Usually this is done by web frameworks transparently, if you need bytes, you should encode it again — geckos
– geckos, Commented Feb 4, 2019 at 1:47
Python3 here: a string is already decoded (it is Unicode, do not care about the internal encoding). You encode a string. The encoding will transform a letter (e.g. A) into a byte sequence. Note: print could also give unexpected results, because of operating system (and locale) expected encoding. — Giacomo Catenazzi
– Giacomo Catenazzi, Commented Feb 4, 2019 at 16:40

jsbueno · Accepted Answer · 2019-02-04 02:32:58Z

The most important part to properly answer this is the information on how you pass these objetcts to the Python2 program: you are using JSON.

So, stay with me:

After you do the .encode step in program 1, you have a bytes object. By calling str(...) on it, you are just putting a escaping layer on this bytes object, and turning it back to a string - but when this string is written as is to a file, or transmited over the network, it will be encoded again - any non-ASCII tokens are usually escaped with the \u prefix and the codepoint for each character - but the original Chinese chracters themselves are now encoded in utf-8 and doubly-escaped.

Python's JSON load methods already decode the contents of json data into text-strings: so a decode method is not to be expected at all.

In short: to pass data around, simply encode your original text as JSON in the first program, and do not botter with any decoding after json.load on the target Python 2 program:

# my first program
x = "宇宙"
# No str-encode-decode dance needed here.
...
data =  json.dumps({"example_key": x, ...})
# code to transmit json string by network or file as it is...


# my other program
text = json.loads(data)["example_key"]
# text is a Unicode text string ready to be used!

As you are doing, you are probably gettint the text doubly-encoded - I will mimick it on the Python 3 console. I will print the result from each step so you can undestand the transforms that are taking place.

In [1]: import json

In [2]: x = "宇宙"

In [3]: print(x.encode("utf-8"))
b'\xe5\xae\x87\xe5\xae\x99'

In [4]: text = str(x.encode("utf-8"))

In [5]: print(text)
b'\xe5\xae\x87\xe5\xae\x99'

In [6]: json_data = json.dumps(text)

In [7]: print(json_data)
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"
# as you can see, it is doubly escaped, and it is mostly useless in this form

In [8]: recovered_from_json = json.loads(json_data)

In [9]: print(recovered_from_json)
b'\xe5\xae\x87\xe5\xae\x99'

In [10]: print(repr(recovered_from_json))
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"

In [11]: # and if you have data like this in files/databases you need to recover:

In [12]: import ast

In [13]: recovered_text = ast.literal_eval(recovered_from_json).decode("utf-8")

In [14]: print(recovered_text)
宇宙

ljavaly · Accepted Answer · 2024-01-18 21:28:51Z

0

I had this issue when decoding stringified bytes received in a JSON payload:

print(value)
>>> "b'text-here'"

Encoding the string only compounded the problem by adding another layer of wrapping:

encoded = value.encode("utf-8")
print(encoded)
>>> b"b'text-here'"

And this solution from @jsbueno didn't work for me - I got a json.decoder.JSONDecodeError:

recovered_from_json = json.loads(json_data)

SOLUTION

You need to evaluate the string to expose the wrapped bytes object:

import ast

as_bytes = ast.literal_eval(value)
print(as_bytes)
>>> b'text-here'

Then you can decode to string:

decoded = value.decode()
print(decoded)
>>> 'text-here'

As noted by @snakecharmerb, you shouldn't use eval() as it opens you up to running potentially dangerous commands if input is not checked or sanitized (see this old but illustrative blog post)

edited Jan 18, 2024 at 21:28

answered Jan 17, 2024 at 18:48

ljavaly

214 bronze badges

1 Comment

snakecharmerb Over a year ago

Using eval like this is most unwise - ast.literal_eval is better and safer for this particular case.

Pankaj · Accepted Answer · 2019-02-04 01:59:42Z

-1

Mainly you are dealing with two different python version and it has the library issue.

six library solve this issue.

Six provides simple utilities for wrapping over differences between Python 2 and Python 3. It is intended to support codebases that work on both Python 2 and 3 without modification.

use this library and decode in this way.

import six

def bytes_to_str(s, encoding='utf-8'):
    """Returns a str if a bytes object is given."""
    if six.PY2 and isinstance(s, bytes):
        return s.decode(encoding)
    return s

text = bytes_to_str(x)
print(text)

answered Feb 4, 2019 at 1:59

Pankaj

9398 silver badges16 bronze badges

1 Comment

jsbueno Over a year ago

This particular issue has not to do with the difference between Python 2 and 3, and rather, with double-encoding. Check my answer for details.

Collectives™ on Stack Overflow

How do I convert a string into bytes in python?

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related