Scraping with Selenium on Dynamic Content (MULTIPLE Pages) - Python

Question

I have been trying to scrape dynamic content (Restaurant Title, Rating, Type of Restaurant) from Doordash and It is not just one website that I am trying to scrape, but rather multiple websites, probably around 100 - 1000 pages on a single domain on Doordash.

I got a 'single scrape" to work, however, when I used the code below, it gave me a long error

def ScrapeDoorDash(df):
for i in df:
    url = df[i]
    print(url)
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get(url)
    restaurantname = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/h1').text
    rating = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[3]/div/span[1]').text
    #estauranttype = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[1]/span').text
    #Store into / print Out
    print (restaurantname, rating, restauranttype)

The XPath is already correct, but I noticed Selenium opens chrome every time to let it finish loading before scraping content. And on the code I provided above, I noticed the error already popped up before the first page finished loading.

Is there a way to implement some code to "pause the for loop" to let it load and scrape first before moving onto the next item in the "URL dataframe"?

Please use the below to create the URL dataframe

url = ["https://www.doordash.com/store/popeyes-toronto-254846/en-CA", "https://www.doordash.com/store/sunset-grill-toronto-211003/en-CA"]

url = pd.DataFrame(data) URL

The error message is below(it is much longer). It says no such element, however, I tried it individually when the page was done loading, those elements were found and the right content was scraped. It is just that when I try to scrape multiple pages, it gives me an error.

Any help would be appreciated!

Fabix · Accepted Answer · 2021-01-24 21:38:23Z

1

You can use pause the script by using the time module.

import time

time.sleep(2)

Put it between the request and the scape lines.

The script will be pause for the time you put in the brackets, in seconds. In this case 2 seconds.

Do some test and put the shortest time that will let the script work.

answered Jan 24, 2021 at 21:38

Fabix

2811 gold badge2 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ZacLanghorne · Accepted Answer · 2021-01-24 21:48:43Z

As Fabix said the time module will allow you to sleep your code before your retrieve the elements from the webpage.

Additionally, to prevent the chrome driver opening a new instance for every url, open the browser outside of the loop.

import time

def ScrapeDoorDash(urls):
    with webdriver.Chrome(ChromeDriverManager().install()) as driver:
        for url in urls:
            print(url)
            driver.get(url)
            time.sleep(3)
            restaurantname = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/h1').text
            rating = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[3]/div/span[1]').text
            restauranttype = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[1]/span').text
            #Store into / print Out
            print (restaurantname, rating, restauranttype)

By using with webdriver.Chrome(ChromeDriverManager().install()) as driver: the driver connection will close after you exit the statement.

Илья Кузнецов · Accepted Answer · 2021-01-25 01:09:25Z

I suggest you using waits. It is probably better than time.sleep, because you don't have to find perfect time yourself and it's more reliable, but it makes code bigger(though you can create function for it):

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

xpath = "..."
wait_time = 10
# driver will try to find element by xpath for 10 seconds
# if could not find, will raise TimeoutException

interval = 0.1 # time between attempts to search xpath. 0.5 seconds by default

# returns found element
elem = WebDriverWait(driver, wait_time , interval ).until(EC.presence_of_element_located((By.XPATH, xpath)))
some = elem.text

For opening browser every time see ZacLanghorne's answer

Collectives™ on Stack Overflow

Scraping with Selenium on Dynamic Content (MULTIPLE Pages) - Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related