0

I am looking for piece of advice as I am newbie to python.

Let's imagine that I have multiple data blocks similar to following one:

<td> <a href="address.com" title=title">some title</a> <br /> aaa<br /> bbb<br /> ccc</td>

Sometimes number of br differs and is not constant for all blocks.

My purpose is to extract data from inside td block to file however I stuck here.

Is it regular expression here the best approach?

Thank you in advance.

1
  • 2
    BeautifulSoup may suit your need, if you're dealing with lots of these data blocks. Commented Jun 9, 2013 at 19:09

1 Answer 1

5

Parse the HTML with a HTML parser like BeautifulSoup (pip install beautifulsoup4):

from bs4 import BeautifulSoup

html = """
<td> <a href="address.com" title=title">some title</a> <br /> aaa<br /> bbb<br /> ccc</td>
"""

soup = BeautifulSoup(html)

for td in soup.find_all('td'):
    print(td.get_text())

And the result:

 some title  aaa bbb ccc
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.