1

I'm trying to write a python program that can extract text between list in html. I would like to extract information like the book being hardcover and number of pages. Does anybody know the command for this operation?

<h2>Product Details</h2>
  <div class="content">
<ul>

<li><b>Hardcover:</b> 156 pages</li>

<li><b>Publisher:</b> Insight Editions; Har/Pstr edition (June 18, 2013)</li>

<li><b>Language:</b> English</li>

<li><b>ISBN-10:</b> 1608871827</li>
<li><b>ISBN-13:</b> 978-1608871827</li>

For parse other information I used:

definition in soup.findAll('span', {"class":'bb_price'}):
definition = definition.renderContents() 

but it does not work for this situation.

1 Answer 1

2

Find the b tag by text and get the next_sibling.

Working example:

from bs4 import BeautifulSoup

data = """<h2>Product Details</h2>
  <div class="content">
<ul>

<li><b>Hardcover:</b> 156 pages</li>

<li><b>Publisher:</b> Insight Editions; Har/Pstr edition (June 18, 2013)</li>

<li><b>Language:</b> English</li>

<li><b>ISBN-10:</b> 1608871827</li>
<li><b>ISBN-13:</b> 978-1608871827</li></ul></div>"""

soup = BeautifulSoup(data)

print soup.find('b', text='Hardcover:').next_sibling
print soup.find('b', text='Publisher:').next_sibling

prints:

156 pages
Insight Editions; Har/Pstr edition (June 18, 2013)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.