0

Im using Python2.7

I have an unicode string like this:

s = u'Rub\xc3\xa9n'

I would like printing this:

print convert(s)
Rubén

I tried directly printing in several ways, but with not success:

print y
Rubén
print y.enconde('utf-8')
Rubén
print y.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
  return codecs.utf_8_decode(input, errors, True)
  UnicodeEncodeError: 'ascii' codec can't encode characters in    position 3-4: ordinal not in range(128)    

I know the form in which I declared the string is not the best, but other scripts are giving that format.

Thank you very much for help.

0

1 Answer 1

1

That is a Unicode string that was mis-decoded as latin1 or a similar encoding such as windows-1252, but was really utf8:

>>> s = 'Rub\xc3\xa9n'.decode('latin1')
>>> s
u'Rub\xc3\xa9n'

It should have been decoded as:

>>> s = 'Rub\xc3\xa9n'.decode('utf8')
>>> s
u'Rub\xe9n'
>>> print s
Rubén

If you don't have control of how the string was generated, you can undo the problem with:

>>> print u'Rub\xc3\xa9n'.encode('latin1').decode('utf8')
Rubén
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.