python rest saving parsed xml document - error encoding

Question

I have a function that should save the xml from response to file. Input arguments are response and name of file (objNm:)

def getXml ( response, objNm):
    root = ET.fromstring(response.text)
    tree = ET.ElementTree(root)
    xmlNm = objNm + ".xml"
    tree.write(open(xmlNm, 'w'), encoding='unicode')
    print('Object {} was succsessfully created.'.format(xmlNm))

That returns me an error:

Traceback (most recent call last): File "test.py", line 56, 
    in <module> getXml(response, 'test_example') 
    File "test.py", line 17, in getXml root = ET.fromstring(response.text) 
    File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in XML parser.feed(text) 
    File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1640, in feed self._parser.Parse(data, 0) 
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 142489-142490: ordinal not in range(128)

An error with using root = ET.fromstring(response.text.decode('utf-8'))

Traceback (most recent call last):
  File "test.py", line 56, in <module>
    getXml(response, 'test_example')
  File "test.py", line 17, in getXml
    root = ET.fromstring(response.text.decode('utf-8'))
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 142489-142490: ordinal not in range(128)

I have tried encoding utf 8, did not help either.

Can anybody halp me eliminate this error?

Can you copy here by any chance the text between these indexes? 142489-142490 ? Theoretically you could do a slice like response.text[142489:142490+1] — andreihondrari
– andreihondrari, Commented Apr 9, 2019 at 12:03
And I'm assuming that type(response.text) yields bytes ? — andreihondrari
– andreihondrari, Commented Apr 9, 2019 at 12:21

andreihondrari · Accepted Answer · 2019-04-09 12:53:59Z

1

If you're using python2.7 typically the python files are open by default in ascii mode. You need to specify # -*- coding: utf-8 -*- at the top of your file.

Some other things that can be done:

calling encoded_text = response.text.encode('utf-8', 'replace') and then using that for the fromstring(encoded_text).

Tested via:

import codecs
data = u'abcdÃ«Ã«aaÃ«'
data = data.encode('utf-8', 'replace')
something = codecs.utf_8_decode(data, 'strict', True)
print something

An alternative is to set utf-8 system wide like:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

edited Apr 9, 2019 at 12:53

answered Apr 9, 2019 at 11:30

andreihondrari

5,8335 gold badges33 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

andreihondrari Over a year ago

@JanFi86 check my answer regarding the file encoding # -*- coding: utf-8 -*-

andreihondrari Over a year ago

@JanFi86 really?? ... I was trying the following codecs.utf_8_decode(u'abcdÃ«', 'strict', True) in my python console but still got some errors even with specifying the utf-8 encoding... now I'm baffled.

andreihondrari Over a year ago

@JanFi86 check my answer again

JanFi86 Over a year ago

calling encoded_text = response.text.encode('utf-8', 'replace') helped definitely, doublechecked, looks fine, thank you so much

Collectives™ on Stack Overflow

python rest saving parsed xml document - error encoding

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related