0

I tried to click "More" button for each review so that I can expand these text reviews to the full contents and then I try to scrape those text reviews. Without clicking "More" button, what I end up retrieving is something like
"This room was nice and clean. The location...More".

I tried a few different functions to figure it out such as selenium button click and ActionChain but I guess I'm not using these properly. Could someone help me out with this issue?

Below is my current code: I didn't upload the whole code to avoid some unnecessary outputs (tried to make it simple).

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains

#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")

#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)

#url I want to visit.
lists=['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']

for k in lists:

    driver.get(k)
    html =driver.page_source
    soup=BeautifulSoup(html,"html.parser")
    time.sleep(3)
    listing=soup.find_all("div", class_="review-container")

    for i in range(len(listing)):

        try:
            #First, I tried this but didn't work.
            #link = driver.find_element_by_link_text('More')
            #driver.execute_script("arguments[0].click();", link)

            #Second, I tried ActionaChains but didn't work.
            ActionChains(driver).move_to_element(i).click().perform()
        except:
            pass

        text_review=soup.find_all("div", class_="prw_rup prw_reviews_text_summary_hsx")
        text_review_inside=text_review[i].find("p", class_="partial_entry")
        review_text=text_review_inside.text

        print (review_text)
1
  • you the biggest mistake in all this code is except: pass. Without this you would resolve problem long time ago. Code raise error message with all information but you can't see it. Commented Oct 25, 2019 at 1:51

1 Answer 1

1

Your the biggest mistake in all this code is except: pass. Without this you would resolve problem long time ago. Code raise error message with all information but you can't see it. You could at least use

except Exception as ex:
    print(ex)

Problem is that move_to_element() will not work with BeautifulSoup elements. I has to be Selenium's element - like

link = driver.find_element_by_link_text('More')

ActionChains(driver).move_to_element(link)

But after executing some functions Selenium needs some time to do it - and Python has to wait awaile.

I don't use BeautifulSoup to get data but if you want to use it then get driver.page_source after clicking all links. Or you will have to get again and again driver.page_source after every click.

Sometimes after clicking you may have to get again even Selenium elements - so I first get entry to click More and later I get partial_entry to get reviews.

I found that clicking More in first review it shows text for all reviews so it doesn't need to click on all More.

Tested with Firefox 69, Linux Mint 19.2, Python 3.7.5, Selenium 3.141


#from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
import time

#Incognito Mode
option = webdriver.ChromeOptions()
option.add_argument("--incognito")

#Open Chrome
#driver = webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)

driver = webdriver.Firefox()

#url I want to visit.
lists = ['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']

for url in lists:

    driver.get(url)
    time.sleep(3)

    link = driver.find_element_by_link_text('More')

    try:
        ActionChains(driver).move_to_element(link)
        time.sleep(1) # time to move to link

        link.click()
        time.sleep(1) # time to update HTML
    except Exception as ex:
        print(ex)

    description = driver.find_element_by_class_name('vr-overview-Overview__propertyDescription--1lhgd')
    print('--- description ---')
    print(description.text)
    print('--- end ---')

    # first "More" shows text in all reviews - there is no need to search other "More"
    first_entry = driver.find_element_by_class_name('entry')
    more = first_entry.find_element_by_tag_name('span')

    try:
        ActionChains(driver).move_to_element(more)
        time.sleep(1) # time to move to link

        more.click()
        time.sleep(1) # time to update HTML
    except Exception as ex:
        print(ex)

    all_reviews = driver.find_elements_by_class_name('partial_entry')
    print('all_reviews:', len(all_reviews))

    for i, review in enumerate(all_reviews, 1):
        print('--- review', i, '---')
        print(review.text)
        print('--- end ---')

EDIT:

To skip responses I search all class="wrap" and then inside every wrap I search class="partial_entry". I every wrap can be only one review and eventually one response. Review has alwasy index [0]. Some wraps don't keep review so they will gives empty list - and I have to check it before I can get element [0] from list.

all_reviews = driver.find_elements_by_class_name('wrap')
#print('all_reviews:', len(all_reviews))

for review in all_reviews:
    all_entries = review.find_elements_by_class_name('partial_entry')
    if all_entries:
        print('--- review ---')
        print(all_entries[0].text)
        print('--- end ---')
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, I appreciate your revised comments. But it looks like that I'm also retrieving managerial response (host's comments for reviews) at the same time. text reviews and managerial response have same div class, that's why. Is there any way not to collect managerial response?
there are other functions to search element and you can create more complex rules or functions. You can even use XPath. In every class="wrap" is only one review and one managerial response - if you first find all "wrap" and inside every "wrap" you search review then first will be your review - all_reviews_in_wrap[0]
Hi, is there any reason why action chain does not work on the 2nd page of reviews for that listing? 'more' button is not clickable for the 2nd page of the reviews.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.