That seems that I've used wrong function. With .fromstring - there're no error messages
xml_ = load() # here comes the unicode string with Cyrillic letters
print xml_ # prints everything fine
print type(xml_) # 'lxml.etree._ElementUnicodeResult' = unicode
xml = xml_.decode('utf-8') # here is an error
doc = lxml.etree.parse(xml) # if I do not decode it - the same error appears here
File "testLog.py", line 48, in <module>
xml = xml_.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 89-96: ordinal not in range(128)
If
xml = xml_.encode('utf-8')
doc = lxml.etree.parse(xml) # here's an error
or
xml = xml_
then
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 89: ordinal not in range(128)
If I understand it right: I must decode non-ascii string into internal representation, then work with this representation and encode it back before sending to output? It seems that I do exactly this.
Input data must be in unt-8 due to the 'Accept-Charset': 'utf-8' header.