0

I want to use regex to replace <p ....> by '' and </p> by <br>:

<p style="text-align:center;">1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...<\p>

I tried:

re.sub("[\<\[].*?[\\>\\]]", '' '', x)

but it erased everything.

Can someone help me?

2
  • 2
    You have an extra ] at the end of your regex Commented Dec 17, 2018 at 15:20
  • <\/*p.*?> match 1 and 2. Commented Dec 17, 2018 at 15:35

2 Answers 2

2

You should never use regexes for XML/HTML. XML is good at nesting tags, and nested tags are a nightmare for regexes. You should go with lxml of BeautifoulSoup here.

That being said, for very simple use cases, regexes can do the jobs, provided you can make sure that no nesting can occur.

Assuming that you have (note the /p instead of \p):

x = '<p style="text-align:center;">1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...</p>'

you can use:

>>> re.sub(r'<p.*?>(.*?)</p>', r'\1<br/>', x)
'1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...<br/>'
Sign up to request clarification or add additional context in comments.

Comments

2

One option could be to use a capturing group to get the text out of the tag, then add a <br> to the end:

pat = re.compile(r'<p[^>]*>(.*)<\\p>')  # or </p>, as required

print(" {}<br>".format(pat.match(x).group(1)))
# 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...<br>

Or you could do two replacements

pat1 = re.compile(r'<p[^>]*>')
pat2 = re.compile(r'<\\p>')

pat1.sub(' ', pat2.sub('<br>', x))
# 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...<br>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.