How can I fix encoding errors in a string in python

Question

I have a python script as a subversion pre-commit hook, and I encounter some problems with UTF-8 encoded text in the submit messages. For example, if the input character is "å" the output is "?\195?\165". What would be the easiest way to replace those character parts with the corresponding byte values? Regexp doesn't work as I need to do processing on each element and merge them back together.

code sample:

infoCmd = ["/usr/bin/svnlook", "info", sys.argv[1], "-t", sys.argv[2]]
info = subprocess.Popen(infoCmd, stdout=subprocess.PIPE).communicate()[0]
info = info.replace("?\\195?\\166", "æ")

It'll probably help if you post the code that is causing the issues. — Daniel DiPaolo
– Daniel DiPaolo, Commented Feb 18, 2011 at 16:28

Jeremy Whitlock · Accepted Answer · 2011-02-18 16:28:17Z

1

I do the same things in my code and you should be able to use:

... u_changed_path = unicode(changed_path, 'utf-8') ...

When using the approach above, I've only run into issues with characters like line feeds and such. If you post some code, it could help.

answered Feb 18, 2011 at 16:28

Jeremy Whitlock

3,81828 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How can I fix encoding errors in a string in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related