0

I am trying to get all the hotels but even though I have executed scrolled down script my page_source shows just the html code that contains 11 hotels i.e. what was loaded initially.

How can I get the entire data source code after scrolling down to scrape all the hotels?

If driver.execute script is loading the entire page then how do I store the page source of entire page in my variable?

PS: this is just for educational purpose

from selenium import webdriver
import re
import pandas as pd
import time
chrome_path = r"C:\Users\ajite\Desktop\web scraping\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN')

driver.implicitly_wait(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)

two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')
1
  • trying to get all the hotels is not equivalent to page_source, possibly you need a List of Hotels. Let me know if I am right. Commented Feb 13, 2018 at 5:50

1 Answer 1

1

Your scroll is not being executed, instead of:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 

you should try:

for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
  driver.execute_script('window.scrollBy(0, 400)')
  time.sleep(1)

The code I tried:

import selenium
import time
from selenium import webdriver
driver = webdriver.Chrome()

driver.get("https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN")
driver.implicitly_wait(3)

for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
  driver.execute_script('window.scrollBy(0, 400)')
  time.sleep(1)

time.sleep(10) #more time so the cards will load

two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')

two_hotels now has more values

enter image description here

For i in the range 25 value I got 42 values for hotel, I think you need to tune a bit the values to get all what you need.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.