2

I want to port this code from python2 to python3

p = re.compile(ur"(?s)<u>(.*?)<\/u>")
subst = "\underline{\\1}"
raw_html = re.sub(p, subst, raw_html)

I figured out already that the ur shall be changed to just r:

p = re.compile(r"(?s)<u>(.*?)<\/u>")
subst = "\underline{\\1}"
raw_html = re.sub(p, subst, raw_html)

however it does not work it complains about this:

cd build && PYTHONWARNINGS="ignore" python3 ../src/katalog.py --katalog 1
Traceback (most recent call last):
  File "src/katalog.py", line 11, in <module>
    from common import *
  File "src/common.py", line 207
    subst = "\underline{\\1}"
                             ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
make: *** [katalog1] Error 1

however changing it to "\underline" does not help either. It is not replacing it then.

0

1 Answer 1

2

Use

import re
raw_html = r"<u>1</u> and <u>2</u>"
p = re.compile(r"(?s)<u>(.*?)</u>")
subst = r"\\underline{\1}"
raw_html = re.sub(p, subst, raw_html)
print(raw_html)

See Python proof, the results are \underline{1} and \underline{2}. Basically, inside replacement, use double backslash to replace with a single backslash. Use raw string literals to make life easier with regex in Python.

Sign up to request clarification or add additional context in comments.

1 Comment

Absolute correct. Thanks for the clarification!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.