lxml.cssselect CSSSelector doesn't support attributes

Question

I'm trying to parse HTML pages and get items with specific attributes. I'm using lxml.cssselect for the job.

I can't seem to get it to work with attribute selector. The following selector: p[itemprop="articleBody"] returns nothing on this page for example. Using the same selector from Firefox or Chrome works.

When I try selectors with no attributes, they do work.

I create the CSSSelector with the html translator.

Is this kind of selector simply not supported by lxml.cssselect? I couldn't find any reference to it in the docs.

Would you mind posting your code please?

gtlambert
– gtlambert

2015-09-02 21:36:56 +00:00
Commented Sep 2, 2015 at 21:36 — gtlambert
– gtlambert, Commented Sep 2, 2015 at 21:36

gtlambert · Accepted Answer · 2015-09-02 21:03:16Z

1

I don't have expertise with lxml.cssselect (I've had a quick go and can't even set up the element tree, so have been unable to replicate your exact problem). However, I have had success using an equivalent lxml method that may be of use to you.

from lxml import html
import requests

url = 'http://abcnews.go.com/US/wireStory/man-jail-writing-racist-graffiti-refugees-homes-33488053'
page = requests.get(url)

tree = html.fromstring(page.text)
p_elements = tree.cssselect('p[itemprop="articleBody"]')
print(p_elements)

Output:

[<Element p at 0xa503ae8>,
 <Element p at 0xa503db8>,
 <Element p at 0xa503bd8>,
 <Element p at 0xa54b1d8>,
 <Element p at 0xa54b0e8>,
 <Element p at 0xa54b138>,
 <Element p at 0xa54b188>]

Generally, when using lxml I find that selecting elements by XPath is far more flexible than by CSS selector.

answered Sep 2, 2015 at 21:03

gtlambert

12k2 gold badges32 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

zmbq Over a year ago

ARGH! This is someone else's code I'm working on. Turns out they used lxml.html.clean.clean_html to clean the HTML. The default behavior (which they used) is to drop all 'unsafe' attributes, including this one.

Collectives™ on Stack Overflow

lxml.cssselect CSSSelector doesn't support attributes

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related