how to parse html by using class name in lxml.etree python

Question

req = requests.get(url)
tree = etree.HTML(req.text)

now instead of using xpath tree.xpath(...) I would like to know if we can search by class name of id as we do in beautifulSoup soup.find('div',attrs={'class':'myclass'}) I'm looking for something similar in lxml.

Why would you not use XPath? That would seem to do exactly what you want. docs.python.org/2/library/… — FrobberOfBits
– FrobberOfBits, Commented May 12, 2014 at 17:41

roippi · Accepted Answer · 2014-05-12 18:22:34Z

2

The far more concise way to do that in bs4 is to use a css selector:

soup.select('div.myclass') #  == soup.find_all('div',attrs={'class':'myclass'})

lxml provides cssselect as a module (which actually compiles XPath expressions) and as a convenience method on Element objects.

import lxml.html

tree = lxml.html.fromstring(req.text)
for div in tree.cssselect('div.myclass'):
    #stuff

Or optionally you can pre-compile the expression and apply that to your Element:

from lxml.cssselect import CSSSelector
selector = CSSSelector('div.myclass')

selection = selector(tree)

edited May 12, 2014 at 18:22

answered May 12, 2014 at 17:45

roippi

26k4 gold badges52 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bryan Oakley · Accepted Answer · 2016-01-28 18:27:07Z

1

You say that you don't want to use xpath but don't explain why. If the goal is to search for a tag with a given class, you can do that easily with xpath.

For example, to find a div with the class "foo" you could do something like this:

tree.find("//div[@class='foo']")

answered Jan 28, 2016 at 18:27

Bryan Oakley

389k53 gold badges582 silver badges739 bronze badges

Collectives™ on Stack Overflow

how to parse html by using class name in lxml.etree python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related