2

I'm trying to convert my Python 2 script to Python 3. How do we do Regex with Unicode?

This is what I had in Python 2 which works It replaces quotes to « and »:

text = re.sub(ur'"(.*?)"', ur'«\1»', text)

I have some really complex ones which the "ur" made it so easy. But it doesn't work in Python 3:

text = re.sub(ur'ه\sایم([\]\.،\:»\)\s])', ur'ه\u200cایم\1', text)

3
  • You do not need u in Python 3 as all strings are Unicode by default. Omit the u prefixes. Commented Dec 16, 2016 at 9:48
  • @Klaus D. IMO not a duplicate. Referenced question is for python 2.x Commented Dec 16, 2016 at 10:18
  • Thanks. Removing u fixed the problem. Commented Dec 16, 2016 at 10:31

2 Answers 2

4

All strings in Python3 are unicode by default. Just remove the u and you should be fine.

In Python2 strings are lists of bytes by default, so we use u to mark them as unicode strings.

Sign up to request clarification or add additional context in comments.

1 Comment

What about if you're writing code that is supposed to work in both Python 2 and 3?
0

Since Python 3.0, the language features a str type that contain Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

Unicode HOWTO This doc will help you.

so, you just do want every you do in Python2, and it will works, no extra effects.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.