4

I am currently using python 2.7 and doing web scraping on a Chinese website.

How to convert unicode below into a string?

Simple str() function does not work and states UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

Thanks in advance,

    u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'
1

1 Answer 1

2

Your string was already encoded, so it should be a bytes object not a unicode object. Try and solve that problem instead. i.e. the repr of your scraped data should be looking like this:

'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

not like this:

u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

To recover the Chinese text from the unicode object, you can jump to bytes and back:

>>> text = u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'
>>> print text.encode('latin-1').decode('utf-8')

中国深圳
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.