Python scraping - XPath syntax with multiple conditions

Question

I am writing a simple scraper to pull flight prices from Kayak - I am scraping multiple data items (duration, airline, price etc) using XPath and storing each in a list of 15 values (# of results on a Kayak page).

My problem is that the "price" variable scrape returns more than 15 values because in addition to the "best" result it also pulls the additional displayed results (see screenshot - large font on RHS vs. two offers in bottom LHS).

I've narrowed down the problem to the following:

1) Overall (working) XPath to pull both values is:

'//a[@class="booking-link "]/span[@class="price option-text"]/span[@class = "price-text"]'

2) The key to distinguish the main price from the additional price lies in the @id string, where the @id for both types of prices is

(i) partly randomly generated,
(ii) contains "-price-text" in both cases and
(iii) contains "extra-info" only in the additional price,

e.g.:
- Main price: //*[@id="pck6-mb-aE-1d84916e1b2-price-text"]
- Additional price: //*[@id="NB5A-extra-info-hmb-tE-15ae5bd2e33-price-text"]

How do I write an XPath which pulls only the main prices, i.e. filters out any XPaths which contain the "extra-info" string in the @id? I've tried several ways (examples below) but can't seem to get the syntax right. Any help appreciated, thanks!

Examples tried:

'//a[@class="booking-link "]/span[@class="price option-text"]/span[@class = "price-text" and not[contains(@id,"extra-info")]]'

'//a[@class="booking-link "]//span[@class="price option-text"]//span[[not[contains(@id,"extra-info")]//span[contains(@id,"-price-text")]]'

'//a[@class="booking-link "]/span[@class="price option-text"]/span[len(@id==33)]'

enter image description here

Jack Fleeting · Accepted Answer · 2020-03-28 18:49:27Z

1

Try something like:

//a[@class="booking-link "]/span[@class="price option-text"]/span[@class="price-text"][not(contains(@id,"extra-info"))]

answered Mar 28, 2020 at 18:49

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

SeleniumUser · Accepted Answer · 2020-03-28 18:52:41Z

0

You can also use ancestor to get list of prices, try below solution

//span[@class='custom-text'][contains(text(),'View Deal')]/ancestor::div[@class="multibook-dropdown"]//span[@class = "price-text"]

answered Mar 28, 2020 at 18:52

SeleniumUser

4,1973 gold badges15 silver badges45 bronze badges

Collectives™ on Stack Overflow

Python scraping - XPath syntax with multiple conditions

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related