i've got an XML-File with contains some german umlauts. My goal is to read in the file and store the results into a database. For testing I got two different files. The first is according to chardet UTF-8-SIG the other one is UTF-8.
Preprocessing the data is done by unicode(field[0]) after reading the file with lxml
Parsing the first file works fine, but processing the other results in an encoding error: UnicodeEncodeError: 'ascii' codec can't encode characters in position: ordinal not in range(128)
For example such string can be u'Zubeh\xf6r' (print(field[0]).
Using print (field[0].encode("utf-8")) results in the right string, but the type is str instead of unicode