2

I have a binary object:

b'{"node": "\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435"}}'

and I want it to be printed in Unicode and not strictly using ASCII symbols.

There is a hacky way to do it:

decoded = string.decode()
parsed_to_dict = json.loads(decoded)
dumped = json.dumps(parsed_to_dict, ensure_ascii=False)
print(dumped)

>>> {"node": "Обновление"}

however the text will not always be parseable as JSON, so I need a simpler way.

Is there a way to print out my binary object (or a decoded Unicode string) as a non-ascii string without going trough parsing/dumping JSON?

For example, how to print this b'\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435' as Обновление?

8
  • If it may not be parsable as JSON… then what is it? Commented May 18, 2018 at 11:17
  • 1
    It's not a string, it's a bytes object. Commented May 18, 2018 at 11:18
  • @deceze It's not unclear what he's asking IMO. They want to remove the escape backslashes to get that result. They're saying they've found a way in the case that it's a json string, but they want a method in the general case. Commented May 18, 2018 at 11:22
  • 1
    @FHT Sure, but this example looks like JSON. Both JSON parsing and AST-literal parsing work on that, yes. But if the concern is that in some cases it may not be valid JSON… well then, what will it be? Valid Python which works with AST? Or something entirely different? Commented May 18, 2018 at 11:23
  • 2
    I guess you could do data.decode('unicode-escape'). But I'd be wary of recommending that without knowing what variations are possible in the input data. Commented May 18, 2018 at 11:27

2 Answers 2

3

A bytes string like

b'\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435'

has been encoded using Unicode escape sequences. To convert it back into a proper Unicode string you simply need to specify the 'unicode-escape' codec:

data = b'\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435'
out = data.decode('unicode-escape')
print(out)

output

Обновление

However, if data is already a Unicode string, then you first need to encode it to bytes. You can do that using the ascii codec, presuming data only contains ASCII characters. If it contains characters outside ASCII but within the range of \x80 to \xff you may be able to use the 'latin1' codec.

data = '\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435'
out = data.encode('ascii').decode('unicode-escape')
Sign up to request clarification or add additional context in comments.

Comments

0

This should work so long as all the escapes are valid (no single \).

import ast
bytes_object = b'{"node": "\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435"}}'

unicode_string = ast.literal_eval("'{}'".format(bytes_object.decode()))

output:

'{"node": "Обновление"}}'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.