5

I'm trying to create a basic web scraper for Amazon results. As I'm iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).

My code:

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')
    
for page in range(1, last_page_number + 1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name('pagnCur')
    driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name('pagnCur').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number + 1))
        next_page.click()
        print('page #', page, ': going to next page')
    else:
        print('page #: ', page, 'error')

I've looked at this question, and I'm guessing that a similar fix can be applied, but I'm not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10) isn't actually waiting a full 10 seconds.

The exception is pointing to the line that starts with driver.execute_script. This is the exception:

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

Sometimes I'll get a ValueError:

ValueError: invalid literal for int() with base 10: ''

So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.

2
  • What is your scenario? What is expected output? Commented Dec 5, 2018 at 21:54
  • once you click(), it loads a new page (with a new DOM). so 2nd iteration of your loop the elements are stale. Commented Dec 5, 2018 at 23:36

2 Answers 2

3

If you just want your script to iterate over all the result pages, you don't need any complicated logic - just make a click on Next button while it's possible:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

while True:
    try:
        wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
    except TimeoutException:
        break

P.S. Also note that implicitly_wait(10) should not wait full 10 seconds, but wait up to 10 seconds for element to appear in HTML DOM. So if element is found within 1 or 2 seconds then wait is done and you will not wait rest 8-9 seconds...

Sign up to request clarification or add additional context in comments.

3 Comments

Cleanest approach as usual.
@andersson this worked beautifully! Thank you! how did you know that 'a > span#pagnNextString' is the appropriate css selector? When I inspect the next button and copy the css selector it shows up as '#pagnNextString'. Also, thank you for explaining implicitly_wait()!
@MariahAkinbi , Note that on last page Next button (span with id="pagnNextString") is not a child of anchor (a), but Selenium (for some reason) still "think" that it is clickable. So to break the loop on the last iteration we should explicitly specify that we need a link with "pagnNextString" child, but not just element "pagnNextString"
3

This error message...

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.

The common reasons behind this this issue are:

  • The element have changed position within the HTML.
  • The element is no longer attached to the DOM TREE.
  • The webpage on which the element was part of has been refreshed.
  • The previous instance of element has been refreshed by a JavaScript or an AjaxCall.

This usecase

Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
    while True:
        try:
            current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
            driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
            current_page_number = current_page_number_element.get_attribute("innerHTML")
            WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
            print("page # {} : going to next page".format(current_page_number))
        except:
            print("page # {} : error, no more pages".format(current_page_number))
            break
    driver.quit()
    
  • Console Output:

    page # 1 : going to next page
    page # 2 : going to next page
    page # 3 : going to next page
    page # 4 : going to next page
    page # 5 : going to next page
    page # 6 : going to next page
    page # 7 : going to next page
    page # 8 : going to next page
    page # 9 : going to next page
    page # 10 : going to next page
    page # 11 : going to next page
    page # 12 : going to next page
    page # 13 : going to next page
    page # 14 : going to next page
    page # 15 : going to next page
    page # 16 : going to next page
    page # 17 : going to next page
    page # 18 : going to next page
    page # 19 : going to next page
    page # 20 : error, no more pages
    

4 Comments

this works great!!! Thank you! What is the purpose of the second WebDriverWait line?
@MariahAkinbi First WebDriverWait for the current_page_number_element to be visible before we attempt to scroll. Once we have already scrolled second WebDriverWait for the element_to_be_clickable so that our solution works flawless cross platform.
okay, makes sense! If the element is visible, doesn't that mean it's clickable? Or I could skip the visible wait and only use the clickable wait - because all that matters is if it's clickable?
No, if the element is visible doesn't guarantees it's clickable. Ideally, if you are not clicking visible wait is sufficient but before you attempt to click, click wait is needed to make your program flawless cross platform.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.