0

I have a python script as a subversion pre-commit hook, and I encounter some problems with UTF-8 encoded text in the submit messages. For example, if the input character is "å" the output is "?\195?\165". What would be the easiest way to replace those character parts with the corresponding byte values? Regexp doesn't work as I need to do processing on each element and merge them back together.

code sample:

infoCmd = ["/usr/bin/svnlook", "info", sys.argv[1], "-t", sys.argv[2]]
info = subprocess.Popen(infoCmd, stdout=subprocess.PIPE).communicate()[0]
info = info.replace("?\\195?\\166", "æ")
1
  • 2
    It'll probably help if you post the code that is causing the issues. Commented Feb 18, 2011 at 16:28

1 Answer 1

1

I do the same things in my code and you should be able to use:

... u_changed_path = unicode(changed_path, 'utf-8') ...

When using the approach above, I've only run into issues with characters like line feeds and such. If you post some code, it could help.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.