0

When I output some Chinese character in Python (Pandas), it shows as below

\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85\xe5\x86\xb5\xe6\x98\xaf\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x95\x85\xe9\x9a\x9c\xe7\x81\xaf\xef\xbc\x8c\xe6\xa3\x80\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe6\x8f\x92\xe5\xa4\xb4\xe6\x98\xaf\xe5\x90\xa6\xe6\x8e\xa5\xe8\x99\x9a\xef\xbc\x8c\xe7\x84\xb6\xe5\x90\x8e\xe6\x9f\xa5\xe4\xb8\x80\xe4\xb8\x8b\xe6\xb2\xb9\xe6\xb3\xb5\xe5\x86\x85\xe7\xae\xa1\xe9\x81\x93\xe5\x8e\x8b\xe5\x8a\x9b\xe6\x98\xaf\xe5\x90\xa6\xe7\xac\xa6\xe5\x90\x88\xe6\xad\xa3\xe5\xb8\xb8\xe5\x80\xbc\xe3\x80\x82

What is the encoding format? It is not unicode as I know. Thanks!

7
  • 1
    Try putting # -*- coding: utf-8 -*- at the top of your python source file to force Pytohn into UTF-8 Commented Jul 13, 2018 at 22:24
  • 3
    that's hexadecimal Commented Jul 13, 2018 at 22:25
  • 2
    @Ben A coding directive only affects how the interpreter decodes the script itself, it has no effect on what the script does to external data that it reads or writes. Commented Jul 13, 2018 at 22:25
  • 1
    That looks like UTF-8 encoded Chinese to me, although I don't read Chinese. 这种情况是油泵故障灯,检查一下油泵插头是否接虚,然后查一下油泵内管道压力是否符合正常值。 Commented Jul 13, 2018 at 22:28
  • 1
    Surely those online tools want to know what the encoding is as well? Commented Jul 14, 2018 at 0:01

3 Answers 3

1

The output you are receiving is called a bytes object. In order to decode it, you need to do output.decode('utf-8').

For example:

output = b'\xe8\xbf\x99\xe7...'
unicode_output = output.decode('utf-8')
print(unicode_output)

would then output non-latin characters (I cannot include it because it counts as spam).

Another way to do this in one-line would be: print(b'\xe8\xbf\x99\xe7...'.decode('utf-8')).

However, if that doesn't work, then it is probably because of the fact that your output isn't a bytes object, but is contained within a string. If that does not work, then there is another solution.

output = '\xe8\xbf\x99\xe7...'
exec('print(b\''+ output + '\'.decode(\'utf-8\'))')

That should be able to fix it. Hope you got something useful out of this. Have a good day!

Sign up to request clarification or add additional context in comments.

Comments

0

This is bytes type, containing a valid utf-8 Chinese text (as far as I can trust Google Translate).

If it's a string literal from your code, add # -*- coding: utf-8 -*- as the first line of your Python file.

If it's an external data, here's how to convert it to a text (str type): bytes_text.decode("utf-8")

Comments

0

raw_bytes = b'\xe8\xbf\x99\xe7\xa7\x8d\xe6\x83\x85 . . .'

with raw_bytes a <class 'bytes'> object containing your hexadecimal characters you can then call decode on raw_bytes and get a <class 'str'> representation of your characters.

string_text = raw_bytes.decode("utf-8")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.