StaleElementException when iterating with Python

Question

I'm trying to create a basic web scraper for Amazon results. As I'm iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).

My code:

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')
    
for page in range(1, last_page_number + 1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name('pagnCur')
    driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name('pagnCur').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number + 1))
        next_page.click()
        print('page #', page, ': going to next page')
    else:
        print('page #: ', page, 'error')

I've looked at this question, and I'm guessing that a similar fix can be applied, but I'm not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10) isn't actually waiting a full 10 seconds.

The exception is pointing to the line that starts with driver.execute_script. This is the exception:

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

Sometimes I'll get a ValueError:

ValueError: invalid literal for int() with base 10: ''

So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.

once you click(), it loads a new page (with a new DOM). so 2nd iteration of your loop the elements are stale. — Corey Goldberg
– Corey Goldberg, Commented Dec 5, 2018 at 23:36

Andersson · Accepted Answer · 2018-12-05 22:14:58Z

3

If you just want your script to iterate over all the result pages, you don't need any complicated logic - just make a click on Next button while it's possible:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

while True:
    try:
        wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
    except TimeoutException:
        break

P.S. Also note that implicitly_wait(10) should not wait full 10 seconds, but wait up to 10 seconds for element to appear in HTML DOM. So if element is found within 1 or 2 seconds then wait is done and you will not wait rest 8-9 seconds...

edited Dec 5, 2018 at 22:14

answered Dec 5, 2018 at 22:09

Andersson

52.8k18 gold badges83 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

SIM Over a year ago

Cleanest approach as usual.

Mariah Akinbi Over a year ago

@andersson this worked beautifully! Thank you! how did you know that 'a > span#pagnNextString' is the appropriate css selector? When I inspect the next button and copy the css selector it shows up as '#pagnNextString'. Also, thank you for explaining implicitly_wait()!

Andersson Over a year ago

@MariahAkinbi , Note that on last page Next button (span with id="pagnNextString") is not a child of anchor (a), but Selenium (for some reason) still "think" that it is clickable. So to break the loop on the last iteration we should explicitly specify that we need a link with "pagnNextString" child, but not just element "pagnNextString"

undetected Selenium · Accepted Answer · 2020-07-10 21:43:15Z

3

This error message...

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.

The common reasons behind this this issue are:

The element have changed position within the HTML.
The element is no longer attached to the DOM TREE.
The webpage on which the element was part of has been refreshed.
The previous instance of element has been refreshed by a JavaScript or an AjaxCall.

This usecase

Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
while True:
    try:
        current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
        driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
        current_page_number = current_page_number_element.get_attribute("innerHTML")
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
        print("page # {} : going to next page".format(current_page_number))
    except:
        print("page # {} : error, no more pages".format(current_page_number))
        break
driver.quit()

Console Output:

page # 1 : going to next page
page # 2 : going to next page
page # 3 : going to next page
page # 4 : going to next page
page # 5 : going to next page
page # 6 : going to next page
page # 7 : going to next page
page # 8 : going to next page
page # 9 : going to next page
page # 10 : going to next page
page # 11 : going to next page
page # 12 : going to next page
page # 13 : going to next page
page # 14 : going to next page
page # 15 : going to next page
page # 16 : going to next page
page # 17 : going to next page
page # 18 : going to next page
page # 19 : going to next page
page # 20 : error, no more pages

edited Jul 10, 2020 at 21:43

answered Dec 6, 2018 at 6:55

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

4 Comments

Mariah Akinbi Over a year ago

this works great!!! Thank you! What is the purpose of the second WebDriverWait line?

undetected Selenium Over a year ago

@MariahAkinbi First WebDriverWait for the current_page_number_element to be visible before we attempt to scroll. Once we have already scrolled second WebDriverWait for the element_to_be_clickable so that our solution works flawless cross platform.

Mariah Akinbi Over a year ago

okay, makes sense! If the element is visible, doesn't that mean it's clickable? Or I could skip the visible wait and only use the clickable wait - because all that matters is if it's clickable?

undetected Selenium Over a year ago

No, if the element is visible doesn't guarantees it's clickable. Ideally, if you are not clicking visible wait is sufficient but before you attempt to click, click wait is needed to make your program flawless cross platform.

Collectives™ on Stack Overflow

StaleElementException when iterating with Python

2 Answers 2

3 Comments

This usecase

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

This usecase

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related