I am trying to load a file saved as UTF-8 into python (ver2.6.6) which contains 14 different languages. I am using the python codecs module to decode the txt file.
import codecs
f = open('C:/temp/list_test.txt', 'r')
for lines in f:
line=filter_str(lines.decode("utf-8")
This all works well. I parse the entire file and then want to export 14 different language files. The problem that I can't understand is the following
I use the following code for output:
malangout = codecs.open("C:/temp/'polish.txt",'w','utf-8','surrogateescape')
for item in lang_dic['English']:
temp = lang_dic[lang1][item]
malangout.write(temp + '\n')
malangout.close()
Example:
- Language: Polish
- Expected output: Dziennik zakłóceń
- Actual output: Dziennik zak‚óceƒ
The string is stored as is:
u'Dziennik zak\u201a\xf3ce\u0192'
I have tried many encoding from the python docs (7.8 codecs). Any infomation would help at this point.
import locale; print(locale.getpreferredencoding())on your system?