0

I have a csv with some data, and in one row there is a text that was added after encoding it in utf-8.

This is the text:

"b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'"

I'm trying to use this text to obtain the original characters using the decode function, but it's imposible.

Does anyone know which is the correct procedure to do it?

2 Answers 2

4

Assuming that the line in your file is exactly like this:

b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'

And reading the line from the file gives the output:

>>> line
"b'\\xe7\\x94\\xb3\\xe8\\xbf\\xaa\\xe8\\xa5\\xbf\\xe8\\xb7\\xaf255\\xe5\\xbc\\x84660\\xe5\\x8f\\xb7\\xe5\\x92\\x8c665\\xe5\\x8f\\xb7 \\xe4\\xb8\\xad\\xe5\\x9b\\xbd\\xe4\\xb8\\x8a\\xe6\\xb5\\xb7\\xe6\\xb5\\xa6\\xe4\\xb8\\x9c\\xe6\\x96\\xb0\\xe5\\x8c\\xba 201205'"`

You can try to use eval() function:

with open(r"your_csv.csv", "r") as csvfile:
    for line in csvfile:
        # when you reach the desired line
        b = eval(line).decode('utf-8')

Output:

>>> print(b)
'申迪西路255弄660号和665号 中国上海浦东新区 201205'
Sign up to request clarification or add additional context in comments.

3 Comments

What the file contens is : b'\xe7\x94\xb3\xe8\...' and when I read and print is <class 'str'> b'\xe7\x94\xb3\xe8'
Can you show what the actual file looks like? May be from an editor like Notepad++?
@Madmartigan that is exactly what is meant by this answer using eval(), did you try it ?
0

Try this:-

a = b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\xa5\xbf\xe8\xb7\xaf255\xe5\xbc\x84660\xe5\x8f\xb7\xe5\x92\x8c665\xe5\x8f\xb7 \xe4\xb8\xad\xe5\x9b\xbd\xe4\xb8\x8a\xe6\xb5\xb7\xe6\xb5\xa6\xe4\xb8\x9c\xe6\x96\xb0\xe5\x8c\xba 201205'
print(a.decode('utf-8')) #your decoded output

As you are saying you are reading from file then you can try with passing encoding system when reading:-

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
    print repr(line)

3 Comments

I know that works. My problem is that I can not find the way to prepare the string. When I read the row I obtain "b'\xe7\x94\xb3\xe8\xbf\xaa\xe8\..." But I need b'\xe7\x94\xb3\xe8\xbf\xaa\xe8...'
@Madmartigan ok in that case i modified my answer...try with it
@Narendra OP is asking about python-3. It's enough to use open(path, 'r', encoding='utf-8'). You don't have to use the codecs module.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.