I have a function like this:
def convert_to_unicode(data):
row = {}
if data == None:
return data
try:
for key, val in data.items():
if isinstance(val, str):
row[key] = unicode(val.decode('utf8'))
else:
row[key] = val
return row
except Exception, ex:
log.debug(ex)
to which I feed a result set (got using MySQLdb.cursors.DictCursor) row by row to transform all the string values to unicode (example {'column_1':'XXX'} becomes {'column_1':u'XXX'}).
Problem is one of the rows has a value like {'column_1':'Gabriel García Márquez'}
and it does not get transformed. it throws this error:
'utf8' codec can't decode byte 0xed in position 12: invalid continuation byte
When I searched for this it seems that this has to do with ascii encoding.
The solutions i tried are:
adding
# -*- coding: utf-8 -*-at the beginning of my file ... does not helpchanging the line
row[key] = unicode(val.decode('utf8'))torow[key] = unicode(val.decode('utf8', 'ignore'))... as expected it ignores the non-ascii character and returns{'column_1':u'Gabriel Garca Mrquez'}changing the line
row[key] = unicode(val.decode('utf8'))torow[key] = unicode(val.decode('latin-1'))... Does the job but I am afraid it will support only West Europe characters (as per Here )
Can anybody point me towards a right direction please.