0

I only want to scrape the required information contained in the black box, and delete/remove/exclude the information contained in the red boxenter image description here

I am doing this because class names "entry" and "partial entry" exist in both boxes. Only the first "partial entry" contains the information that I need, so I plan to delete/remove/exclude the classname "mgrRspnInLine".

My code is:

while True:
    container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
    for item in container:
        try:
            element = item.find_element_by_class_name('mgrRspnInline')
            driver.execute_script("""var element = document.getElementsByClassName("mgrRspnInline")[0];element.parentNode.removeChild(element);""", element)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element.click()
            time.sleep(2)
            rating = item.find_elements_by_xpath('.//*[contains(@class,"ui_bubble_rating bubble_")]')
            for rate in rating:
                rate = rate.get_attribute("class")
                rate = str(rate)
                rate = rate[-2:]
                score_list.append(rate)
            time.sleep(2)
            stay = item.find_elements_by_xpath('.//*[contains(@class,"recommend-titleInline noRatings")]')
            for stayed in stay:
                stayed = stayed.text
                stayed = stayed.split(', ')
                stayed.append(stayed[0])
                travel_type.append(stayed[1])
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
            summary = item.find_elements_by_xpath('.//*[contains(@class,"noQuotes")]')
            for comment in summary:
                comment = comment.text
                comments.append(comment)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
            rating_date = item.find_elements_by_xpath('.//*[contains(@class,"ratingDate")]')
            for date in rating_date:
                date = date.get_attribute("title")
                date = str(date)
                review_date.append(date)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
            review = item.find_elements_by_xpath('.//*[contains(@class,"partial_entry")]')
            for comment in review:
                comment = comment.text
                print(comment)
                reviews.append(comment)
        except (NoSuchElementException) as e:
            continue
    try:
        element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
        element.click()
        time.sleep(2)
    except (ElementClickInterceptedException,NoSuchElementException) as e:
        print(e)
        break

Basically within the "review-container" I searched first for the class name "mgrRspnInLine", then tried to delete it using the execute_script.

but unfortunately, the output still shows the contents contained in the"mgrRspnInLine".

1
  • Your code for removing element should work. There might be several elements with class name mgrRspnInLine (hidden?), so probably you're removing the wrong element... You can simplify your code to driver.execute_script("""arguments[0].parentNode.removeChild(arguments[0]);""", element) Commented Nov 19, 2018 at 11:45

4 Answers 4

2

If you want to avoid matching second element by your XPath you can just modify XPath as below:

.//*[contains(@class,"partial_entry") and not(ancestor::*[@class="mgrRspnInLine"])]

This will match element with class name "partial_entry" only if it doesn't have ancestor with class name "mgrRspnInLine"

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome expression @sir Andersson. Always something new to learn.
0

If you want the first occurrence you could use css class selector instead of:

.partial_entry

and retrieve with find_element_by_css_selector:

find_element_by_css_selector(".partial_entry")

Comments

0

You can delete all the .mgrRspnInLine elements with:

driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")

Comments

0

Stitching the comment by Andersson, and the two answers provided by QHarr, and pguardiario. I finally solved the problem.

The key is to target a container within the container, all the information is contained in the class name "ui_column is-9" which is contained in the class name "review-container", hence addressing Andersson's comment of multiple mgrRspnInLine.

Within the nested loop, I used pguardianrio's suggestion to delete existing multiple mgrRspnInLine, then adding QHarr's answer on .partial_entry

while True:
    container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
    for items in container:
        element = WebDriverWait(driver, 1000).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
        element.click()
        time.sleep(10)
        contained = items.find_elements_by_xpath('.//*[contains(@class,"ui_column is-9")]')
        for item in contained:
            try:
                driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")
                rating = item.find_element_by_xpath('//*[contains(@class,"ui_bubble_rating bubble_")]')
                rate = rating .get_attribute("class")
                rate = str(rate)
                rate = rate[-2:]
                score_list.append(rate)
                time.sleep(2)
                stay = item.find_element_by_xpath('.//*[contains(@class,"recommend-titleInline")]')
                stayed = stay.text
                stayed = stayed.split(', ')
                stayed.append(stayed[0])
                travel_type.append(stayed[1])
                WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
                summary = item.find_element_by_xpath('.//*[contains(@class,"noQuotes")]')
                comment = summary.text
                comments.append(comment)
                WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
                rating_date = item.find_element_by_xpath('.//*[contains(@class,"ratingDate")]')
                date = rating_date.get_attribute("title")
                date = str(date)
                review_date.append(date)
                WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
                review = item.find_element_by_css_selector(".partial_entry")
                comment = review.text
                print(comment)
            except (NoSuchElementException) as e:
                continue
    try:
        element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
        element.click()
        time.sleep(2)
    except (ElementClickInterceptedException,NoSuchElementException) as e:
        print(e)
        break

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.