How to scrape a data from a dynamic website containing Javascript using Python?

Question

I am trying to scrape data from https://www.doordash.com/food-delivery/chicago-il-restaurants/

The idea is to scrape all the data regarding the different restaurant listings on the website. The site is divided into different cities, but I only require restaurant data for Chicago.

All restaurant listings for the city have to be scraped along with any other relevant data about the respective restaurants (Ex: Reviews, Rating, Cuisine, address, state etc). I need to capture all the respective details(currently 4,326 listings) for the city in the Excel.

I have tried to extract the restaurant name, cuisine, ratings and review inside the class named "StoreCard_root___1p3uN". But No datas have been displayed. The output is blank.


from selenium import webdriver

chrome_path = r"D:\python project\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)

driver.get("https://www.doordash.com/food-delivery/chicago-il-restaurants/")

driver.find_element_by_xpath("""//*[@id="SeoApp"]/div/div[1]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[3]""").click()

posts = driver.find_elements_by_class_name("StoreCard_root___1p3uN")

for post in posts:
    print(post.text) ```

What is your question? Have you received an error whilst trying to scrape data from this website? If so please tell us what error you are trying to solve. I'm not understanding what you require — Mark Davies
– Mark Davies, Commented Dec 5, 2019 at 10:18
make your life easy man ! use API api.doordash.com/v2/seo_city_stores/… — αԋɱҽԃ αмєяιcαη
– αԋɱҽԃ αмєяιcαη, Commented Dec 5, 2019 at 10:26

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2019-12-05 13:46:41Z

2

you can use the API url as the data rendered from it actually via XHR request.

iterate over the API link below and scrape whatever you want.

https://api.doordash.com/v2/seo_city_stores/?delivery_city_slug=chicago-il-restaurants&store_only=true&limit=50&offset=0

You will just loop over this parameter offset=0 by increasing it +50 each time as each page will shown 50 items till you reach 4300 as it's the last page ! simply by range(0, 4350, 50)

import requests
import pandas as pd

data = []
for item in range(0, 4350, 50):
    print(f"Extracting item# {item}")
    r = requests.get(
        f"https://api.doordash.com/v2/seo_city_stores/?delivery_city_slug=chicago-il-restaurants&store_only=true&limit=50&offset={item}").json()
    for item in r['store_data']:
        item = (item['name'], item['city'], item['category'],
                item['num_ratings'], item['average_rating'], item['average_cost'])
        data.append(item)

df = pd.DataFrame(
    data, columns=['Name', 'City', 'Category', 'Num Ratings', 'Average Ratings', 'Average Cost'])
df.to_csv('output.csv', index=False)
print("done")

Sample of Output:

View Output online: Click Here

Full Data is here: Click Here

edited Dec 5, 2019 at 13:46

answered Dec 5, 2019 at 10:32

αԋɱҽԃ αмєяιcαη

11.6k3 gold badges23 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Samuel Liew Over a year ago

Comments are not for extended discussion; this conversation has been moved to chat.

Sucheth Shivakumar Over a year ago

How do we get address ?

Tade Ogundele · Accepted Answer · 2021-10-25 16:31:18Z

0

I was faced with this issue too, but I solved it using selenium and BeautifulSoup by doing the following:

Make sure the algorithm clicks button to reveal Menu and prices if necessary
The menu and prices have to be processed because they will might come off as nested list after the extraction from parsing so the get_text() function won't work on them right away. The code and explanation can be found in this medium article

Tackling empty list web scraping with selenium

edited Oct 25, 2021 at 16:31

answered May 8, 2021 at 13:01

Tade Ogundele

331 silver badge8 bronze badges

Comments

Stuart Murless · Accepted Answer · 2022-04-18 15:15:03Z

0

I have checked out the API that αԋɱҽԃ αмєяιcαη mentions. They also had an endpoint for restaurant info.

URL https://api.doordash.com/v2/restaurant/[restaurantId]/

It was working until recently when it started returning {"detail":"Request was throttled."}

Has anyone had the same issue / found a workaround?

answered Apr 18, 2022 at 15:15

Stuart Murless

1

2 Comments

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Kent Kostelac Over a year ago

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review

Collectives™ on Stack Overflow

How to scrape a data from a dynamic website containing Javascript using Python?

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related