1

I have been trying to scrape dynamic content (Restaurant Title, Rating, Type of Restaurant) from Doordash and It is not just one website that I am trying to scrape, but rather multiple websites, probably around 100 - 1000 pages on a single domain on Doordash.

I got a 'single scrape" to work, however, when I used the code below, it gave me a long error

def ScrapeDoorDash(df):
for i in df:
    url = df[i]
    print(url)
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get(url)
    restaurantname = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/h1').text
    rating = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[3]/div/span[1]').text
    #estauranttype = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[1]/span').text
    #Store into / print Out
    print (restaurantname, rating, restauranttype)

The XPath is already correct, but I noticed Selenium opens chrome every time to let it finish loading before scraping content. And on the code I provided above, I noticed the error already popped up before the first page finished loading.

Is there a way to implement some code to "pause the for loop" to let it load and scrape first before moving onto the next item in the "URL dataframe"?

Please use the below to create the URL dataframe

url = ["https://www.doordash.com/store/popeyes-toronto-254846/en-CA", "https://www.doordash.com/store/sunset-grill-toronto-211003/en-CA"]

url = pd.DataFrame(data) URL

The error message is below(it is much longer). It says no such element, however, I tried it individually when the page was done loading, those elements were found and the right content was scraped. It is just that when I try to scrape multiple pages, it gives me an error. enter image description here

Any help would be appreciated!

3 Answers 3

1

You can use pause the script by using the time module.

import time

time.sleep(2)

Put it between the request and the scape lines.

The script will be pause for the time you put in the brackets, in seconds. In this case 2 seconds.

Do some test and put the shortest time that will let the script work.

Sign up to request clarification or add additional context in comments.

Comments

1

As Fabix said the time module will allow you to sleep your code before your retrieve the elements from the webpage.

Additionally, to prevent the chrome driver opening a new instance for every url, open the browser outside of the loop.

import time

def ScrapeDoorDash(urls):
    with webdriver.Chrome(ChromeDriverManager().install()) as driver:
        for url in urls:
            print(url)
            driver.get(url)
            time.sleep(3)
            restaurantname = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/h1').text
            rating = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[3]/div/span[1]').text
            restauranttype = driver.find_element_by_xpath('//*[@id="root"]/div/div[1]/div[2]/div/div[1]/header/div[2]/div[1]/div[1]/span').text
            #Store into / print Out
            print (restaurantname, rating, restauranttype)

By using with webdriver.Chrome(ChromeDriverManager().install()) as driver: the driver connection will close after you exit the statement.

Comments

1

I suggest you using waits. It is probably better than time.sleep, because you don't have to find perfect time yourself and it's more reliable, but it makes code bigger(though you can create function for it):

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

xpath = "..."
wait_time = 10
# driver will try to find element by xpath for 10 seconds
# if could not find, will raise TimeoutException

interval = 0.1 # time between attempts to search xpath. 0.5 seconds by default

# returns found element
elem = WebDriverWait(driver, wait_time , interval ).until(EC.presence_of_element_located((By.XPATH, xpath)))
some = elem.text

For opening browser every time see ZacLanghorne's answer

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.