0

I'm trying to make a simple scraping loop to pick up titles from dynamic pages. I've made a small script that works the way I expected. Here is the working script:

from selenium import webdriver
driver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')

url = "https://www.youtube.com/user/LinusTechTips/videos"
driver.get(url)

videos = driver.find_elements_by_xpath('.//*[@id="dismissable"]')

for video in videos:
        title = video.find_element_by_xpath('.//*[@id="video-title"]').text
        print(title)

It correctly crawls through divs containing titles and other details and scrapes titles. But this script only seems to work on youtube. I've tried it on craigslist, amazon, bookstoscrape, rightmove and hostelworld but it doesn't seem to work on any of those pages. Here is the script for hostelworld:

from selenium import webdriver
driver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')

url = "https://www.hostelworld.com/s? 
q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08- 
14&to=2020-08-16&guests=2&page=1"

driver.get(url)

cards = driver.find_elements_by_xpath('.//*[@id="__layout"]/div/div[1]/div[4]/div/div/div[3]')

for card in cards:
    title = card.find_element_by_xpath('.//* 
    [@id="__layout"]/div/div[1]/div[4]/div/div/div[3]/div[2]/div[1]/h2/a').text
    print(title)

I'm pretty sure the cards class name is correct from finding it with a search in Chrome dev tools. I think title xpath is correct because it prints correctly if I use it outside the loop. I think the loop is correct too because if I change the cards variable to:

cards = driver.find_elements_by_class_name('property-card')

it prints title once for every card on the page.

But when I add . to the title xpath it returns an error saying "Message: no such element: Unable to locate element: ...". I'm using . to prepend the expression so it only searches the parent element getting iterated through, not the whole page. But for some reason adding . throws the error on all websites I tried except youtube.

I'm trying to stick to xpaths as much as possible because not all websites have good class and id conventions.

3
  • Have you tried with explicit wait for cards elements to load. What's the count you get when you try print(len(cards)) before the loop? Commented May 14, 2020 at 0:56
  • cards = driver.find_elements_by_xpath('.//*[@id="__layout"]/div/div[1]/div[4]/div/div/div[3]') this xpath only match the first result, I tried on chrome developer tool. You can try this xpath: //p[normalize-space(text())='All properties']/following-sibling::div[@class='property-card' and @data-v-c587ed30], it match 15 results (not include the featured result) Commented May 14, 2020 at 1:11
  • @supputuri Thank you, yes it seems like I wasn't waiting for the elements to be visible. The answer from KunduK works well for getting only titles. I'm now trying to figure out how to get more than one element. Commented May 14, 2020 at 14:30

1 Answer 1

1

To Get the title of all properties.Induce WebDriverWait() and wait for visibility_of_all_elements_located() and following css selecor.

url = "https://www.hostelworld.com/s?q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card h2.title.title-6>a")))
for card in cards:
    title = card.text
    print(title)

Output:

The Local NYC
HI NYC Hostel
NY Moore Hostel
Broadway Hotel n Hostel
Q4 Hotel
American Dream Hostel
Giorgio Hotel
Freehand New York
West Side YMCA
Hotel 31
Vanderbilt YMCA
Union Hotel Brooklyn
Victorian Inn
Central Park West Hostel
Jazz on the Park Youth Hotel
The Jane
Nesva Hotel
John Hotel

Please note you need to import below libraries.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Updated with price.

url = "https://www.hostelworld.com/s?q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card")))
for card in cards:

    try:
       title = card.find_element_by_css_selector("h2.title.title-6>a").text
       print(title)
       price=card.find_element_by_css_selector("p.price.title-5").text
       print(price)
    except:
      continue

Output:

The Local NYC
€45
HI NYC Hostel
€41
NY Moore Hostel
€158
Broadway Hotel n Hostel
€73
Freehand New York
€95
Q4 Hotel
€37
Giorgio Hotel
€158
American Dream Hostel
€128
West Side YMCA
€87
Vanderbilt YMCA
€89
Hotel 31
€74
Union Hotel Brooklyn
€128
Victorian Inn
€88
Central Park West Hostel
€42
The Jane
€115
Jazz on the Park Youth Hotel
€78
Nesva Hotel
€136
John Hotel
€165
Sign up to request clarification or add additional context in comments.

7 Comments

That's really interesting. That solution works great for one element but how are more elements added to the loop. I tried: ...located((By.CSS_SELECTOR,"div.property-card h2.title.title-6>a"), (By.CSS_SELECTOR,"div.property-card h2.price title-5>a"))) but it threw errors. I also tried just using another loop for price. It was clunky at best but didn't even work: prices=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card h2.price title-5>a"))) for price in prices: p = price.text print(p)
@123samueld : Can you explain how more elements added i am not getting your point?
@123samueld : No need to post an answer please update your original post what made the code is not working.
Sorry, I'll try to be clearer. Your answer works great for the titles element. But titles on their own aren't enough and I tried to scrape prices in the "property-cards" parent element. I first tried to add another element to the WebDriverWait method, that didn't work. Then I added a prices variable with the WDW method and a loop just for prices. That didn't work either. How can more than one child element be scraped from the same parent element being iterated through? I couldn't find documentation to cover it.
@123samueld : your css selector is wrong..I have made changes.try now prices=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card h2.price.title-5>a"))) for price in prices: p = price.text print(p)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.