1

I have some html file:

<html>
 <body>
   <span class="text">One</span>some text1</br>
   <span class="cyrillic">Мир</span>some text2</br>
 </body>
</html>

How can i get "some text1" and "some text2" using lxml with python?

2
  • Here's the tutorial: codespeak.net/lxml/tutorial.html Anything specific you don't understand? Commented Nov 15, 2010 at 2:38
  • This tutorial link is defunct. Please remove. Commented Apr 23, 2013 at 21:18

2 Answers 2

6
import lxml.html

doc = lxml.html.document_fromstring("""<html>
 <body>
   <span class="text">One</span>some text1</br>
   <span class="cyrillic">Мир</span>some text2</br>
 </body>
</html>
""")

txt1 = doc.xpath('/html/body/span[@class="text"]/following-sibling::text()[1]')
txt2 = doc.xpath('/html/body/span[@class="cyrillic"]/following-sibling::text()[1]')
Sign up to request clarification or add additional context in comments.

Comments

3

I use lxml for xml parsing, but I use BeautifulSoup for HTML. Here's a very quick/brief tour, ending with one solution to your question. Hope it helps.

Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import BeautifulSoup as soup
>>> stream = open('bs.html', 'r')
>>> doc = soup(stream.read())
>>> doc.body.span
<span class="text">One</span>
>>> doc.body.span.nextSibling
u'some text1'
>>> x = doc.findAll('span')
>>> for i in x:
...     print unicode(i)
... 
<span class="text">One</span>
<span class="cyrillic">Мир</span>
>>> x = doc('span')
>>> type(x)
<class 'BeautifulSoup.ResultSet'>
>>> for i in x:
...     print unicode(i)
... 
<span class="text">One</span>
<span class="cyrillic">Мир</span>
>>> for i in x:
...     print i.nextSibling
... 
some text1
some text2
>>> 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.