5

I am trying to fetch google reviews with the help of selenium in python. I have imported webdriver from selenium python module. Then I have initialized self.driver as follows:-

self.driver = webdriver.Chrome(executable_path="./chromedriver.exe",chrome_options=webdriver.ChromeOptions())

After this I am using the following code to type the company name on google homepage whose reviews I need, for now I am trying to fetch reviews for "STANLEY BRIDGE CYCLES AND SPORTS LIMITED ":-

company_name = self.driver.find_element_by_name("q")
company_name.send_keys("STANLEY BRIDGE CYCLES AND SPORTS LIMITED ")
time.sleep(2)

After this to click on the google search button, using the following code:-

self.driver.find_element_by_name("btnK").click()
time.sleep(2)

Then finally I am on the page where I can see results. Now I want to click on the View on google reviews button. For that using the following code:-

self.driver.find_elements_by_link_text("View all Google reviews")[0].click()
time.sleep(2)

Now I am able to get reviews, but only 10. I need at least 20 reviews for a company. For that I am trying to scroll the page down using the following code: self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(5)

Even while using the above code to scroll the down the page, I am still getting only 10 reviews. I am not getting any error though.

Need help on how to scroll down the page to get atleast 20 reviews. As of now I am able to get only 10 reviews. Based on my online search for this issue, people have mostly used: "driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")" to scroll the page down whenever required. But for me this is not working. I checked the the height of the page before and after ("driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")") is the same.

4 Answers 4

3

Use Javascript to scroll to the last review, this will trigger additional review load.

last_review = self.driver.find_element_by_css_selector('div.gws-localreviews__google-review:last-of-type')
self.driver.execute_script('arguments[0].scrollIntoView(true);', last_review)

EDIT:

The following example is working correctly for me on Firefox and Chrome, you can reuse the extract google reviews function for your needs

import time

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def extract_google_reviews(driver, query):
    driver.get('https://www.google.com/?hl=en')
    driver.find_element_by_name('q').send_keys(query)
    WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.NAME, 'btnK'))).click()

    reviews_header = driver.find_element_by_css_selector('div.kp-header')
    reviews_link = reviews_header.find_element_by_partial_link_text('Google reviews')
    number_of_reviews = int(reviews_link.text.split()[0])
    reviews_link.click()

    all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
    while len(all_reviews) < number_of_reviews:
        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
        WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
        all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')

    reviews = []
    for review in all_reviews:
        try:
            full_text_element = review.find_element_by_css_selector('span.review-full-text')
        except NoSuchElementException:
            full_text_element = review.find_element_by_css_selector('span[class^="r-"]')
        reviews.append(full_text_element.get_attribute('textContent'))

    return reviews

if __name__ == '__main__':
    try:
        driver = webdriver.Firefox()
        reviews = extract_google_reviews(driver, 'STANLEY BRIDGE CYCLES AND SPORTS LIMITED')
    finally:
        driver.quit()

    print(reviews)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Dalvenjia. I tried using this approach. I checked the height of my document before and after the page load. Following is the code I am using: " before_height = self.driver.execute_script("return document.body.scrollHeight") last_review = self.driver.find_element_by_css_selector('div.gws-localreviews__google-review:last-of-type') self.driver.execute_script('arguments[0].scrollIntoView(true);', last_review) last_height = self.driver.execute_script("return document.body.scrollHeight" " For me, the before_height and after_height are coming same
find_element_by_css_selector is taking last element up till 10. Even though the last element should have been 26th element as there are 26 reviews for this company. That's why I am able to fetch only first 10 reviews. Also, if I manually put the nth child up till 10, it will work. for example: "last_review = self.driver.find_element_by_css_selector('div.gws-localreviews__google-review:nth-child(10)')" If I keep the nth child > 10. It throws the following error: "selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:"
For some companies where the number of reviews are more then 100 (eg: FIRST MILE LIMITED). The following selenium command is not able to retrieve more then 10 results. "all_reviews = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))". As there are a lot of reviews for this company selenium is unable to scroll down till the bottom. Could you please suggest a way for this, to retrieve all the reviews for this company?
@NidhiArora It's working fine for me, with 'FIRST MILE LIMITED' it's extracting 111 reviews, it might be due to network lag, I changed 1 line in the script, please try again. Instead of a hard sleep now I'm using WebdriverWait.
This worked for me perfectly. One catch, occasionally it doesn't scroll all the way down to the bottom, missing on the loading of reviews. So, I have to sit and scroll manually for those times. In that sense, It is semi-automated. Any hack around that?
0

lenOfPage = driver.execute_script('window.scrollTo(0, [hard code the height])')

For me I would hardcorde the height if I am using this automated test for this same page over and over again.

Or you can have it to continuously loop to scroll down the page until the element is found if any.

4 Comments

Thanks Xion. I tried this approach and hardcoded the height of the page. I put it as 2000, as the actual height of the page is around 1500. I tried with 3000 and 4000 as well. Still I was unable to fetch more then 10 reviews.
Then the only other possible solution is ActionChain, and have it find the element of the scrollbar and drop and make it scroll down
I will try that approach. If you can share any sample code for this approach, it will be of great help. Thanks a lot.
body = browser.find_element_by_css_selector('body') body.send_keys(Keys.PAGE_DOWN) Something like this help. You can search more about ActionChain
0

Alternatively, you can also get all of the reviews without the browser automation.

The only thing you need is the data_fid, which you can find in the page source of a place you searched for.

enter image description here

In this case that's: 0x48762038283b0bc3:0xc373b8d4227d0090

After that, you just have to make a request to: https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:,associated_topic:,_fmt:pc

There you will find all the reviews data, as well as the next_page_token, so you can query the next 10 reviews.

In this case next_page_token is: EgIICg

So, the request URL for the next 10 reviews would be: https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x48762038283b0bc3:0xc373b8d4227d0090,sort_by:,next_page_token:EgIICg,associated_topic:,_fmt:pc

You could also use a third party solution like SerpApi. It's a paid API with a free trial. We handle proxies, solve captchas, and parse all rich structured data for you.

Example python code (available in other libraries also):

from serpapi import GoogleSearch

params = {
  "api_key": "secret_api_key",
  "engine": "google_maps_reviews",
  "hl": "en",
  "data_id": "0x48762038283b0bc3:0xc373b8d4227d0090",
}

search = GoogleSearch(params)
results = search.get_dict()

Example JSON output:

"place_info": {
  "title": "Stanley Bridge Cycles & Sports Ltd",
  "address": "Newnham Parade, 11 College Rd, Cheshunt, Waltham Cross, United Kingdom",
  "rating": 5,
  "reviews": 53
},
"reviews": [
  {
    "user": {
      "name": "Armilson Correia",
      "link": "https://www.google.com/maps/contrib/102797076683495103766?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAh",
      "thumbnail": "https://lh3.googleusercontent.com/a-/AOh14GgCCH69E_qgfu3pa1xbTsyvH9ORn8PEonb5FcubKg=s40-c-c0x00000000-cc-rp-mo-ba3-br100",
      "local_guide": true,
      "reviews": 48,
      "photos": 9
    },
    "rating": 5,
    "date": "2 days ago",
    "snippet": "In my opinion The best bike shop In radios of 60 miles Very professional and excellent customer service My bike come out from there riding like a New ,no Words just perfect"
  },
  {
    "user": {
      "name": "John Janes",
      "link": "https://www.google.com/maps/contrib/104286744244406721398?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARAt",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJzRZRQx74RYqpNQArE0ER-d24iQ-3kAwK64-46u=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 2,
      "photos": 1
    },
    "rating": 5,
    "date": "a year ago",
    "snippet": "The guys recently built my new bike and the advice on components to use was invaluable. Even the wheels were built from scratch. A knowledgeable efficient team with great attention to detail. I wouldn't go anywhere else .",
    "likes": 1,
    "images": [
      "https://lh5.googleusercontent.com/p/AF1QipMc5u1rIZ88w-cfeAeF2s6bSndHMhLw8YC_BllS=w100-h100-p-n-k-no"
    ]
  },
  {
    "user": {
      "name": "James Wainwright",
      "link": "https://www.google.com/maps/contrib/116302076794615919905?hl=en-US&sa=X&ved=2ahUKEwja2tvQj-DxAhUHMVkFHcJuD_MQvvQBegQIARA6",
      "thumbnail": "https://lh3.googleusercontent.com/a/AATXAJwx8OTba1pQ9lrzxy7LU5BnrJYWu90METBaK68F=s40-c-c0x00000000-cc-rp-mo-br100",
      "reviews": 36,
      "photos": 7
    },
    "rating": 5,
    "date": "a month ago",
    "snippet": "Want to thank the guys for giving my bike the full service it needed .Its now like new again and I didn't realise how much had worn out.Recomend to anyone in the cheshunt area."
  },
  ...
]

Check out the documentation for more details.

Disclaimer: I work at SerpApi.

Comments

0

Please share your URL page. I've just checked and scrollTo works.

driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

alternatively, you can scroll smoothly

self.driver.execute_script('window.scrollTo({ top: document.body.scrollHeight, behavior: "smooth" });')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.