0

i need do some html parsing use python .if i have a html file like bellow:

《body》
   《div class="mydiv"》
      《p》i want got it《/p》
      《div》
           《p》 good 《/p》
           《a》 boy  《/a》
      《/div》
   《/div》
《/body》

how can i get the content of 《div class="mydiv"》 ,say , i want got .

      《p》i want got it《/p》
      《div》
           《p》 good 《/p》
           《a》 boy 《/a》
      《/div》

i have try HTMLParser, but i fount it can't. anyway else ? thanks!

1
  • 7
    I'm looking at the Related section on the right, and... Commented Jun 1, 2011 at 8:10

3 Answers 3

5

With BeautifulSoup it is as simple as:

from BeautifulSoup import BeautifulSoup
    html = """
      <body>
        <div class="mydiv">
          <p>i want got it</p>
          <div>
            <p> good </p>
            <a> boy  </a>
          </div>
        </div>
      </body>
    """

    soup = BeautifulSoup(html)
    result = soup.findAll('div', {'class': 'mydiv'})
    tag = result[0]
    print tag.contents
    [u'\n', <p>i want got it</p>, u'\n', <div>
    <p> good </p>
    <a> boy  </a>
    </div>, u'\n']
Sign up to request clarification or add additional context in comments.

2 Comments

but, what i got is a list , how can it convert this list to a text file of html format?
from BeautifulSoup import Tag; st = ''.join([str(t) for t in tag if type(t) == Tag]). Then write it: with open('somename.html', 'w') as f: f.write(st). Something like this
4

Use lxml. Or BeautifulSoup.

Comments

1

I would prefer lxml.html.

import lxml.html as H
doc  = H.fromstring(html)
node = doc.xpath("//div[@class='mydiv']")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.