0

I am working on a web scraping project using selenium. In this project, I am trying to scrape the links of products from multiple pages in amazon. For example, when I type laptop in the search bar in Amazon, multiple products are populated, and there exist multiple pages. I want to scrape all the product links from all pages and store them in a list.

This is my code so far

def scrape_pages_selenium(product, total_pages):

    driver = webdriver.Chrome('./chromedriver')

    url = f'https://www.amazon.com/s?k={product}&page=1&ref=nb_sb_noss'

    driver.get(url)
    links = driver.find_elements_by_class_name("a-size-mini")

    product_links = []
    for page in range(1, total_pages+1):

        for link in links:
            product_links.append(link.find_element_by_css_selector('a').get_attribute('href'))

        print(len(product_links))

        try:
            next_page_button = driver.find_element_by_class_name("a-last")
            next_page_button.click()
        except:
            continue

    return product_links

product_links = scrape_pages_selenium('laptop', 7)

This code works correctly on the first page. The next_page_button is used to go to the next page. But when the code tries to scrape the links from the second page, I get the following error

StaleElementReferenceException            Traceback (most recent call last)
<ipython-input-50-09cc65b63734> in <module>
     23     return product_links
     24 
---> 25 product_links = scrape_pages_selenium('gatorade', 7)
     26 

<ipython-input-50-09cc65b63734> in scrape_pages_selenium(product, total_pages)
     12 
     13         for link in links:
---> 14             product_links.append(link.find_element_by_css_selector('a').get_attribute('href'))
     15 
     16         print(len(product_links))

~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in find_element_by_css_selector(self, css_selector)
    428             element = element.find_element_by_css_selector('#foo')
    429         """
--> 430         return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
    431 
    432     def find_elements_by_css_selector(self, css_selector):

~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in find_element(self, by, value)
    657 
    658         return self._execute(Command.FIND_CHILD_ELEMENT,
--> 659                              {"using": by, "value": value})['value']
    660 
    661     def find_elements(self, by=By.ID, value=None):

~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params)
    631             params = {}
    632         params['id'] = self._id
--> 633         return self._parent.execute(command, params)
    634 
    635     def find_element(self, by=By.ID, value=None):

~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
    319         response = self.command_executor.execute(driver_command, params)
    320         if response:
--> 321             self.error_handler.check_response(response)
    322             response['value'] = self._unwrap_value(
    323                 response.get('value', None))

~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
    240                 alert_text = value['alert'].get('text')
    241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
    243 
    244     def _value_or_default(self, obj, key, default):

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=83.0.4103.61)

I am not sure where I am going wrong.

1 Answer 1

1

Move links = driver.find_elements_by_class_name("a-size-mini") inside your loop. This is because when you move to the next page, the links collection is no more valid

find_elements_by_class_name is giving you a snapshot of what exists and the current page, when you move to the next page, that snapshot of dom elements is no more valid

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.