0

I'm dealing with single HTML strings like this

>> s = 'u><br/>\n                                    Some text <br/><br/><u'

where I've got meaningful text embedded inside broken HTML or incomplete HTML tags. I need to extract only that inner text, and ignore the broken HTML. How can I do this? I'm using

>> re.search(r'(.>)(<.>)(.>)', s)
>>

but this returns null.

1 Answer 1

1

If I understand you right, you're looking to take this input:

u><br/>\n                                    Some text <br/><br/><u

And receive this output:

\n                                    Some text 

This is done simply enough by only caring about what comes between the two inward-pointing brackets. We want:

  • A right-bracket > (so we know where to begin)
  • Some text \n Some text (the content) which does not contain a left-bracket
  • A left-bracket < (so we know where to end)

You want:

>>> s = 'u><br/>\n                                    Some text <br/><br/><u'
>>> re.search(r'>([^<]+)<', s)
<_sre.SRE_Match object; span=(6, 55), match='>\n                                    Some text >

(The captured group can be accessed via .group(1).)

Additionally, you may want to use re.findall if you expect there to be multiple matches per line:

>>> re.findall(r'>([^<]+)<', s)
['\n                                    Some text ']

EDIT: To address the comment: If you have multiple matches and you want to connect them into a single string (effectively removing all HTML-like tag things), do:

>>> s = 'nbsp;<br><br>Some text.<br>Some \n more text.<br'
>>> ' '.join(re.findall(r'>([^<]+)<', s))
'Some text. Some \n more text.'
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, but what if s is something like nbsp;<br><br>Some text.<br>Some \n more text.<br, and I need to strip out all the HTML and formatting to just get Some text. Some \n more text.?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.