Selecting an element with multiple classes in python lxml using xpath

Question

I was trying to scrape a website using python request and lxml. I could easily select the elements with single class using html.xpath() but I can't figure out how to select the elements with multiple class.

I used some code like this to select the elements in page with class "title":

page.xpath('//a[@class="title"]')

However, I couldn't select elements with multiple classes. I checked some few codes. I tried to study xpath but it seemes like lxml.html.xpath() works different, may be it's my lack of understanding. I tried few codes which didnt' work for me. They are given below.

HTML code

<a href="https://www.lovemycosmetic.de/skin1004-madagascar-centella-ampoule-30ml-" class="info text-center" title="SKIN1004 Madagascar Centella Ampoule 30ml"> <strong class="supplier"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">SKIN1004</font></font></strong><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">SKIN1004 Madagascar Centella Ampoule 30ml</font></font></a>

Test 1:

page.xpath('//a[@class="info text-center"]')

Test 2:

page.xpath("//a[@class='info text-center']")

Test 3:

page.xpath('//a[@class="info.text-center"]')

Test 4:

page.xpath("//a[contains(@class, 'info') and contains(@class, 'text-center')]")

I did couple more tests too but I forgot to save the code. It will be great to know how to select elements with multiple classes using lxml.html.xpath().

@Alexander I have edited my question. Would you mind checking it. — Ajay Pun Magar
– Ajay Pun Magar, Commented Dec 17, 2022 at 19:45
Not the python code... the html. Either a post the portion that contains the element you are trying to extract or a link to the website that contains it. The reason I want to see the html is because you test1 test2 all look accurate, but without seeing the html its impossible to say why they aren't working — Alexander
– Alexander, Commented Dec 17, 2022 at 19:46
Test2 works for me. a = page.xpath('//a[@class="info text-center"]') print(a[0].text) — LMC
– LMC, Commented Dec 17, 2022 at 20:00

Conal Tuohy · Accepted Answer · 2022-12-17 20:01:36Z

1

NB as far as XPath is concerned, the class attribute's value is a string like any other. It doesn't automatically parse the value as a list of space-delimited tokens, as a CSS selector would. In later versions of XPath you have the function contains-token() but lxml supports XPath 1.0 in which you basically have to tokenize the class value yourself.

If your class values are literally info text-center then you can test it with the predicate [@class="info text-center"], but that won't match a class value of e.g. text-center info or info text-center foo bar. I'd recommend you use the XPath contains() function, e.g.

//a[contains(@class, "info")][contains(@class, "text-center")]

answered Dec 17, 2022 at 20:01

Conal Tuohy

3,3431 gold badge10 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexander · Accepted Answer · 2022-12-17 20:03:34Z

Your test1 and test2 should both work fine, this is the code I used to get the results.

from lxml.html import etree
root = etree.fromstring('<a href="https://www.lovemycosmetic.de/skin1004-madagascar-centella-ampoule-30ml-" class="info text-center" title="SKIN1004 Madagascar Centella Ampoule 30ml"> <strong class="supplier"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">SKIN1004</font></font></strong><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">SKIN1004 Madagascar Centella Ampoule 30ml</font></font></a>')
elem = root.xpath('//a[@class="info text-center"]')[0]
url = elem.xpath('./@href')[0]
print(elem, url)

OUTPUT:

<Element a at 0x1ef01509940> https://www.lovemycosmetic.de/skin1004-madagascar-centella-ampoule-30ml-

Collectives™ on Stack Overflow

Selecting an element with multiple classes in python lxml using xpath

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related