0

I have a text with html tags:

<p><b>Name and LastName</b><br />
Work Title<br /><span class="text-spacer"></span>
</p>

I want to have text in this format:

Name and LastName - Work Title

This is my code in Python but doesn't works:

text = '<p><b>Name and LastName</b><br />
    Work Title<br /><span class="text-spacer"></span>
    </p>'
my_text = re.sub(r'</b><br />', ' - ', text)
1
  • 4
    Do not try to parse html with regex. Use something like BeautifulSoup. Commented Oct 24, 2016 at 14:43

1 Answer 1

3

I'd use a specialized tool for the job - an HTML Parser, like BeautifulSoup:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """<p><b>Name and LastName</b><br />
    ...: Work Title<br /><span class="text-spacer"></span>
    ...: </p>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: soup.p.get_text(separator=" - ", strip=True)
Out[4]: u'Name and LastName - Work Title'

Note the use of separator argument - it allows to provide a custom separator between the child nodes while getting the text of the parent - pretty neat feature that fits your use case nicely.

Sign up to request clarification or add additional context in comments.

1 Comment

And if I have few items, this code return only first... @alecxe

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.