2

I need to paginate through pages and save HTML of each page in a list.

HTML looks like that, for the first page First element with class="sc-4j28w0-1 fDeSdf" is an arrow '>'

<li disabled="" class="sc-4j28w0-1 fDeSdf"></li>
<li data-testid="current-page-item" class="sc-4j28w0-1 sc-4j28w0-2 jDlZyl">1</li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">2</span></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">3</span></li>
<li class="sc-4j28w0-1 lhEbhI"></li>

For the second and additional page (not the last)

<li class="sc-4j28w0-1 lhEbhI"></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">1</span></li>
<li data-testid="current-page-item" class="sc-4j28w0-1 sc-4j28w0-2 jDlZyl">2</li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">3</span></li>
<li class="sc-4j28w0-1 lhEbhI"></li>

For the last page Last element with class="sc-4j28w0-1 fDeSdf" is an arrow '<'

<li class="sc-4j28w0-1 lhEbhI"></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">1</span></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">2</span></li>
<li data-testid="current-page-item" class="sc-4j28w0-1 sc-4j28w0-2 jDlZyl">3</li>
<li disabled="" class="sc-4j28w0-1 fDeSdf"></li>

So if the page first or last the class is 'sc-4j28w0-1 fDeSdf'

I tried to paginate using while loop

#  list for html pages 
news_list = []

while True: 
    wait = WebDriverWait(driver, 10) 

    #  by clicking on the last element of pagination == >
    search = wait.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div/div[2]/div[2]/div/ol/li[5]')))
   # if it is active click
    if search.is_enabled():
        search.click()
        time.sleep(5)
        html = driver.page_source
        soup_news = BeautifulSoup(html)
        news_list.append(soup_news)
    else:
        pass

But the problem that the loop doesn't stop, it keeps saving the last page

I have tried also like that:

wait = WebDriverWait(driver, 10) 

search = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div/div/div[2]/div[2]/div/ol/li[5]')))

while search.get_property('disabled') is False:
    search.click()
    time.sleep(5)
    html = driver.page_source
    soup_news = BeautifulSoup(html)
    news_list.append(soup_news)

But then I get error

---------------------------------------------------------------------------
StaleElementReferenceException            Traceback (most recent call last)
<ipython-input-51-49e862d6475f> in <module>
     34 
     35 
---> 36 while search.is_enabled():
     37     try:
     38         search.click()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in is_enabled(self)
    157     def is_enabled(self):
    158         """Returns whether the element is enabled."""
--> 159         return self._execute(Command.IS_ELEMENT_ENABLED)['value']
    160 
    161     def find_element_by_id(self, id_):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in _execute(self, command, params)
    631             params = {}
    632         params['id'] = self._id
--> 633         return self._parent.execute(command, params)
    634 
    635     def find_element(self, by=By.ID, value=None):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    319         response = self.command_executor.execute(driver_command, params)
    320         if response:
--> 321             self.error_handler.check_response(response)
    322             response['value'] = self._unwrap_value(
    323                 response.get('value', None))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    240                 alert_text = value['alert'].get('text')
    241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
    243 
    244     def _value_or_default(self, obj, key, default):

StaleElementReferenceException: Message: The element reference of <li class="sc-4j28w0-1 lhEbhI"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

Appreciate any help

2
  • 1
    Did you mean else: break instead of else: pass? Commented Dec 9, 2019 at 14:46
  • Tried both, doesn't work Commented Dec 9, 2019 at 14:46

1 Answer 1

3

There are different ways you can approach paginating here. I'll highlight one:

  1. get the current page number
  2. search for the next, exit if not found

The Code:

while True:
   current_page_number = int(driver.find_element_by_css_selector('li[data-testid=current-page-item]').text)

   print(f"Processing page {current_page_number}..")

   try:
       next_page_link = driver.find_element_by_xpath(f'.//li[span = "{current_page_number + 1}"]')
       next_page_link.click()
    except NoSuchElementException:
        print(f"Exiting. Last page: {current_page_number}.")
        break

   # TODO: save the page
Sign up to request clarification or add additional context in comments.

2 Comments

Hello, i have tried and get an error TypeError: int() argument must be a string, a bytes-like object or a number, not 'FirefoxWebElement'
@AnnaDmitrieva oh yeah, forgot the .text there, An'ka, prover' please :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.