0

I am webscraping Glassdoor.com for company reviews using Python.

Currently, I am using Beautiful Soup and grequests. This is working fine for all the fields I need, except for the "Advice to Management" section which only loads in once the Continue Reading button is pressed. See below an example below for this page of reviews:

continue reading button expanded review

There are no changes to the URL as far as I can tell, but there is a JS click-event being fired in the console: Event: EiReviews: Click [continueReading-71858088]

I found a tutorial online for selenium webdriver such as this one, and I wrote this code:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome (executable_path="C:\\chromedriver.exe")
driver.get("https://www.glassdoor.com/Reviews/Alteryx-Reviews-E351220.htm")

btn = driver.find_element(By.CLASS_NAME, "v2__EIReviewDetailsV2__continueReading").click()
driver.execute_script ("arguments[0].click();",btn)

I need something that scales better, as this takes ~20sec to open chrome and click on a singular button. I need to be able to click on every "Continue Reading" button on the page as my end goal is to scrape every review for ~1,000 companies.

4
  • By looking at the HTML of the page, you can notice that right before the <div id="Container"> object, there is a script object starting with window.appCache={.... which contains the complete reviews but in a sort of a strange dictionary/json format, for example it contains the text which appears when you click on Continue Reading "summary":"Great place to work, been here 4+ years","summaryOriginal":null,"advice":"Don't rush too finish a project". Maybe you can extract everything from there Commented Jan 2, 2023 at 10:54
  • Alternatively, you can load the site with selenium, loop through all the reviews and automatically click the Continue Reading button if present Commented Jan 2, 2023 at 10:55
  • Thanks! The window.appCache dict has all the information I need. Commented Jan 2, 2023 at 21:37
  • Good! Is it ok if I post the comment with the solution as an answer so that you can then accept it and the question is closed? Commented Jan 3, 2023 at 7:53

1 Answer 1

1

By looking at the HTML of the page, you can notice that right before the <div id="Container"> object, there is a script object starting with window.appCache={.... which contains the complete reviews in a dictionary format, for example it contains the text which appears when you click on Continue Reading

"summary":"Great place to work, been here 4+ years",
"summaryOriginal":null,"advice":"Don't rush too finish a project"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.