4

I tried a lot of ways to convert the string like b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' into Chinese characters but all failed.

It's really strange that when I just use

print(b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a')

It will show decoded Chinese Characters.

But if I got the string by reading from my CSV file, it won't do. No matter how I decode the string, it will only show me b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'

Here is my script:

import csv 

with open('need_convert.csv','r+') as csvfile:
    reader=csv.reader(csvfile)
    for row in reader:

        new_row=''.join(row)
        print('new_row:')
        print(type(new_row))
        print(new_row)

        print('convert:')
        print(new_row.decode('utf-8'))

Here is my data (csv file): b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a' b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf' b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'

7
  • 1
    Do not post code/data as images. Post as text Commented Jun 19, 2018 at 2:00
  • have you tried: print(str(your_encoding)) Commented Jun 19, 2018 at 2:01
  • 1
    Welcome to Stack Overflow! Please edit your question to include the Python-code as text and include a also some more examples of coded characters in text-form. Thanks! Commented Jun 19, 2018 at 2:04
  • You need to read with the correct encoding. Commented Jun 19, 2018 at 2:10
  • Hi Fallenreaper, Yes, I've tried you method, not working. Sorry. Commented Jun 19, 2018 at 2:30

1 Answer 1

1

row contents and new_row are both strings, not byte types. Below, I'm using exec('s=' + row[0]) to interpret them as desired, assuming the input is safe.

import csv

with open('need_convert.csv','r+') as csvfile:
    reader=csv.reader(csvfile)
    for row in reader:
        print(type(row[0]), row[0])
        exec('s=' + row[0])
        print(type(s), s)
        print(s.decode('utf-8'))

Output:

<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
<class 'str'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
<class 'bytes'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
麒麟杯
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
Sign up to request clarification or add additional context in comments.

1 Comment

What does one do when they do not trust the input?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.