2

I am writing a simple scraper to pull flight prices from Kayak - I am scraping multiple data items (duration, airline, price etc) using XPath and storing each in a list of 15 values (# of results on a Kayak page).

My problem is that the "price" variable scrape returns more than 15 values because in addition to the "best" result it also pulls the additional displayed results (see screenshot - large font on RHS vs. two offers in bottom LHS).

I've narrowed down the problem to the following:

1) Overall (working) XPath to pull both values is:

'//a[@class="booking-link "]/span[@class="price option-text"]/span[@class = "price-text"]'

2) The key to distinguish the main price from the additional price lies in the @id string, where the @id for both types of prices is

  • (i) partly randomly generated,
  • (ii) contains "-price-text" in both cases and
  • (iii) contains "extra-info" only in the additional price,

    e.g.:

    • Main price: //*[@id="pck6-mb-aE-1d84916e1b2-price-text"]
    • Additional price: //*[@id="NB5A-extra-info-hmb-tE-15ae5bd2e33-price-text"]

How do I write an XPath which pulls only the main prices, i.e. filters out any XPaths which contain the "extra-info" string in the @id? I've tried several ways (examples below) but can't seem to get the syntax right. Any help appreciated, thanks!

Examples tried:

'//a[@class="booking-link "]/span[@class="price option-text"]/span[@class = "price-text" and not[contains(@id,"extra-info")]]'

'//a[@class="booking-link "]//span[@class="price option-text"]//span[[not[contains(@id,"extra-info")]//span[contains(@id,"-price-text")]]'

'//a[@class="booking-link "]/span[@class="price option-text"]/span[len(@id==33)]' 

enter image description here

0

2 Answers 2

1

Try something like:

//a[@class="booking-link "]/span[@class="price option-text"]/span[@class="price-text"][not(contains(@id,"extra-info"))]
Sign up to request clarification or add additional context in comments.

Comments

0

You can also use ancestor to get list of prices, try below solution

//span[@class='custom-text'][contains(text(),'View Deal')]/ancestor::div[@class="multibook-dropdown"]//span[@class = "price-text"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.