3

I am trying to read a utf-8 encoded xml file in python and I am doing some processing on the lines read from the file something like below:

next_sent_separator_index =  doc_content.find(word_value, int(characterOffsetEnd_value) + 1)

Where doc_content is the line read from the file and word_value is one of the string from the the same line. I am getting encoding related error for above line whenever doc_content or word_value is having some Unicode characters. So, I tried to decode them first with utf-8 decoding (instead of default ascii encoding) as below :

next_sent_separator_index =  doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)

But I am still getting UnicodeDecodeError as below :

Traceback (most recent call last):
  File "snippetRetriver.py", line 402, in <module>
    sentences_list,lemmatised_sentences_list = getSentenceList(form_doc)
  File "snippetRetriver.py", line 201, in getSentenceList
    next_sent_separator_index =  doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128)

Can anyone suggest me a suitable approach / way to avoid these kind of encoding errors in python 2.7 ?

1
  • 1
    What you have is already Unicode, rather than a byte string in UTF-8. You can't further decode it. (Although you probably want to look at where you got the u'\xe9' from in the first place; it's a character you're unlikely to want. Commented Jun 3, 2012 at 17:05

1 Answer 1

5
codecs.utf_8_decode(input.encode('utf8'))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.