2

Here's the link of the website : website

I would like to have all the links of th hotels in this location.

Here's my script :

import pandas as pd
import numpy as np
from selenium import webdriver
import time

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

cookie = driver.find_element_by_xpath('//button[@class="uolsaJ"]')
try:
    cookie.click()
except:
    pass

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)

time.sleep(5)

my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

links = [my_elem.get_attribute("href") for my_elem in my_elems]


X = np.array(links)
print(X.shape)
#driver.close()

But I cannot find a way to tell the script : scroll down until there is nothing more to scroll.

I tried to change this parameters :

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(30)

I changed the time.sleep(), the number 1000 and so on but my output keep changing and not in the right way.

output

As you can see, I have scraped a lot of numbers differents. How to make my script scraping a same amout each time ? Not necessarily each links but at last a stable number.

Here it scroll and at one point it seems blocked and scrape all the links it has at the moment. That's not appropriate.

2 Answers 2

2

There are several issues here.

  1. You are getting the elements and their links only AFTER you finished scrolling while you should do that inside the scrolling loop.
  2. You should wait until the cookies alert is appearing to close it.
  3. You can scroll until the footer element is presented.
    Something like this:
import pandas as pd
import numpy as np
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)
wait = WebDriverWait(driver, 20)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

wait.until(EC.visibility_of_element_located((By.XPATH, '//button[@class="uolsaJ"]'))).click()

def is_element_visible(xpath):
    wait1 = WebDriverWait(driver, 2)
    try:
        wait1.until(EC.visibility_of_element_located((By.XPATH, xpath)))
        return True
    except Exception:
        return False

while not is_element_visible("//footer[@id='footer']"):
    my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

    links = [my_elem.get_attribute("href") for my_elem in my_elems]

    X = np.array(links)
    print(X.shape)

    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)


#driver.close()
Sign up to request clarification or add additional context in comments.

12 Comments

Thanks Prophet, always here to help :) I will check your code asap
I have updated the answer since I think it was a problem there. Now it should be better. One day I will learn Python :)
while not find_elements_by_xpath("//footer[@id='footer']"): NameError: name 'find_elements_by_xpath' is not defined
@RandallCloud I fixed that more than 3 hours ago... See the updated answer
It seems that doesn't do anything.. The page doesn't scroll and the script end with nothing
|
1

You can try this by directly calling the DOM and locate some element that will be only at the bottom of the page with .is_displayed() selenium method which returns true/false:

# https://stackoverflow.com/a/57076690/15164646
while True:
  # it will be returning false until the element is located
  # "#message" id = "No more results" at the bottom of the YouTube search
  end_result = driver.find_element_by_css_selector('#message').is_displayed() 
  driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

  # further code below
  
  # once the element is found it returns True. If so, it will break out of the while loop
  if end_result == True:
    break

I wrote a blog post where I used this method to scrape YouTube Search.

3 Comments

The script seems tu run endlessly ?
Indeed! Thank you for letting me know! As soon as it be fixed I'll add another comment here so you know.
Hey @RandallCloud! I updated the answer. Now it break out of a while loop when the element at the bottom of the page is located.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.