Suppose I have a mysterious unicode string in Python (2.7) that I want to feed to a command line program such as imagemagick (or really just get it out of Python in any way). The strings might be:
- Adolfo López Mateos
- Stanisława Walasiewicz
- Jörgen Jönsson
So in Python I might make a little command like this:
cmd = u'convert -pointsize 24 label:"%s" "%s.png"' % (name, name)
If I just print cmd and get convert -pointsize 24 label:"Jörgen Jönsson" "Jörgen Jönsson.png" and then run it myself, everything is fine.
- Adolfo López Mateos.png
- example 1 http://4u.jeffcrouse.info/stackoverflow/A-01.png
- Stanisława Walasiewicz.png
- example 2 http://4u.jeffcrouse.info/stackoverflow/A-02.png
But if I do os.system( cmd ), I get this:
- Adolfo López Mateos.png
- example 4 http://4u.jeffcrouse.info/stackoverflow/B-01.png
- Stanisława Walasiewicz.png
- example 5 http://4u.jeffcrouse.info/stackoverflow/B-02.png
I know it's not an imagemagick problem because the filenames are messed up too. I know that Python is converting the command to ascii when it passes it off to os.system, but why is it getting the encoding so wrong? Why is it interpreting each non-ASCII character as 2 characters? According to a few articles that I've read, it might be because it's encoded as latin-1 but it's being read as utf-8, but I've tried encoding it back and forth between them and it's not helping.
I get Unicode exceptions when I try to just encode it manually as ascii without a replacement argument, but if I do name.encode('ascii','xmlcharrefreplace'), I get the following:
- example 4 http://4u.jeffcrouse.info/stackoverflow/C-01.png
- example 5 http://4u.jeffcrouse.info/stackoverflow/C-02.png
I'm hoping that someone recognizes this particular kind of encoding problem and can offer some advice, because I'm about out of ideas.
Thanks!
os.system(cmd.encode("mac-roman"))os.system(cmd.encode("mac-roman"))(cmdis unicode string)... there is no point in decoding and encoding right after