What's the best way to convert every string in a list (containing other lists) to unicode in python?
For example:
[['a','b'], ['c','d']]
to
[[u'a', u'b'], [u'c', u'd']]
What's the best way to convert every string in a list (containing other lists) to unicode in python?
For example:
[['a','b'], ['c','d']]
to
[[u'a', u'b'], [u'c', u'd']]
>>> li = [['a','b'], ['c','d']]
>>> [[v.decode("UTF-8") for v in elem] for elem in li]
[[u'a', u'b'], [u'c', u'd']]
unicode(v). The difference is obvious: that's using the default encoding, which is usually 'ascii', and almost always wrong. Your second edit changed it to unicode(v, "UTF-8"), which is functionally equivalent to the decode call—a bit less clear, and not future-compatible to 3.x, but not actually bad. But I was responding to the first edit.unicode() is not there in 3.x, so removed it. But, may be I'll add it and make a note of it.Unfortunately, there isn't an easy answer with unicode. But fortunately, once you understand it, it'll carry with you to other programming languages.
This is, by far, the best resource that I've seen for python unicode:
http://nedbatchelder.com/text/unipain/unipain.html
Use the arrow keys (on your keyboard) to navigate to the next and previous slides.
Also, please take a look at this (and the other links from the end of that slideshow).
u to the start of the string "abc" just gives you the string "uabc"; it doesn't give you the unicode string u"abc". The u isn't part of the string any more than the quotes are.>>> l = [['a','b'], ['c','d']]
>>> map(lambda x: map(unicode, x), l)
[[u'a', u'b'], [u'c', u'd']]
sys.getdefaultencoding(). And fixing it to take the encoding means either a lambda inside the lambda, or a partial inside the lambda; either way, I think it's much simpler to use a comprehension here.sys.getdefaultencoding(), and it looks nice and clear.