Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

Question

I am currently using python 2.7 and doing web scraping on a Chinese website.

How to convert unicode below into a string?

Simple str() function does not work and states UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

Thanks in advance,

    u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

Possible duplicate of UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Nov 14, 2016 at 21:46

wim · Accepted Answer · 2016-11-14 21:45:36Z

2

Your string was already encoded, so it should be a bytes object not a unicode object. Try and solve that problem instead. i.e. the repr of your scraped data should be looking like this:

'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

not like this:

u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'

To recover the Chinese text from the unicode object, you can jump to bytes and back:

>>> text = u'\n\xe4\xb8\xad\xe5\x9b\xbd\xe6\xb7\xb1\xe5\x9c\xb3\n'
>>> print text.encode('latin-1').decode('utf-8')

中国深圳

answered Nov 14, 2016 at 21:45

wim

368k114 gold badges681 silver badges817 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related