Python - Selenium next page

Question

I am trying to make a scraping application to scrape Hants.gov.uk and right now I am working on it just clicking the pages instead of scraping. When it gets to the last row on page 1 it just stopped, so what I did was make it click button "Next Page" but first it has to go back to the original URL. It clicks page 2, but after page 2 is scraped it doesn't go to page 3, it just restarts page 2.

Can somebody help me fix this issue?

Code:

import time
import config # Don't worry about this. This is an external file to make a DB
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://planning.hants.gov.uk/SearchResults.aspx?RecentDecisions=True"

driver = webdriver.Chrome(executable_path=r"C:\Users\Goten\Desktop\chromedriver.exe")
driver.get(url)

driver.find_element_by_id("mainContentPlaceHolder_btnAccept").click()

def start():
    elements = driver.find_elements_by_css_selector(".searchResult a")
    links = [link.get_attribute("href") for link in elements]

    result = []
    for link in links:
        if link not in result:
            result.append(link)
        else:
            driver.get(link)
            goUrl = urllib.request.urlopen(link)
            soup = BeautifulSoup(goUrl.read(), "html.parser")
            #table = soup.find_element_by_id("table", {"class": "applicationDetails"})
            for i in range(20):
                pass # Don't worry about all this commented code, it isn't relevant right now
                #table = soup.find_element_by_id("table", {"class": "applicationDetails"})
                #print(table.text)
            #   div = soup.select("div.applicationDetails")
            #   getDiv = div[i].split(":")[1].get_text()
            #   log = open("log.txt", "a")
            #   log.write(getDiv + "\n")
            #log.write("\n")

start()
driver.get(url)

for i in range(5):
    driver.find_element_by_id("ctl00_mainContentPlaceHolder_lvResults_bottomPager_ctl02_NextButton").click()
    url = driver.current_url
    start()
    driver.get(url)
driver.close()

Nihal · Accepted Answer · 2018-09-21 07:36:02Z

2

try this:

import time
# import config # Don't worry about this. This is an external file to make a DB
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://planning.hants.gov.uk/SearchResults.aspx?RecentDecisions=True"

driver = webdriver.Chrome()
driver.get(url)

driver.find_element_by_id("mainContentPlaceHolder_btnAccept").click()

result = []


def start():
    elements = driver.find_elements_by_css_selector(".searchResult a")
    links = [link.get_attribute("href") for link in elements]
    result.extend(links)

def start2():
    for link in result:
        # if link not in result:
        #     result.append(link)
        # else:
            driver.get(link)
            goUrl = urllib.request.urlopen(link)
            soup = BeautifulSoup(goUrl.read(), "html.parser")
            #table = soup.find_element_by_id("table", {"class": "applicationDetails"})
            for i in range(20):
                pass # Don't worry about all this commented code, it isn't relevant right now
                #table = soup.find_element_by_id("table", {"class": "applicationDetails"})
                #print(table.text)
            #   div = soup.select("div.applicationDetails")
            #   getDiv = div[i].split(":")[1].get_text()
            #   log = open("log.txt", "a")
            #   log.write(getDiv + "\n")
            #log.write("\n")


while True:
    start()
    element = driver.find_element_by_class_name('rdpPageNext')
    try:
        check = element.get_attribute('onclick')
        if check != "return false;":
            element.click()
        else:
            break

    except:
        break

print(result)
start2()
driver.get(url)

edited Sep 21, 2018 at 7:36

answered Sep 20, 2018 at 12:14

Nihal

5,3547 gold badges26 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Feitan Portor Over a year ago

Yeah but the code also is required to check through each application too. There are 7 each page

Nihal Over a year ago

it is checking. i used while loop

Nihal Over a year ago

use sleep in between loop. as per requirment. i can't run the code right now. but i think this will work fine

Nihal Over a year ago

i thought you had problem to go through each page. so i solved only that. you have to add your other code to get data from table. you can add that just after the line while True:

Nihal Over a year ago

tell me if this is working or not. if it is i will explain the logic

|

undetected Selenium · Accepted Answer · 2018-09-20 13:05:26Z

1

As per the url https://planning.hants.gov.uk/SearchResults.aspx?RecentDecisions=True to click through all the pages you can use the following solution:

Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://planning.hants.gov.uk/SearchResults.aspx?RecentDecisions=True')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "mainContentPlaceHolder_btnAccept"))).click()
numLinks = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#ctl00_mainContentPlaceHolder_lvResults_topPager div.rdpWrap.rdpNumPart>a"))))
print(numLinks)
for i in range(numLinks):
    print("Perform your scrapping here on page {}".format(str(i+1)))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@id='ctl00_mainContentPlaceHolder_lvResults_topPager']//div[@class='rdpWrap rdpNumPart']//a[@class='rdpCurrentPage']/span//following::span[1]"))).click()
driver.quit()

Console Output:

8
Perform your scrapping here on page 1
Perform your scrapping here on page 2
Perform your scrapping here on page 3
Perform your scrapping here on page 4
Perform your scrapping here on page 5
Perform your scrapping here on page 6
Perform your scrapping here on page 7
Perform your scrapping here on page 8

answered Sep 20, 2018 at 13:05

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

3 Comments

Feitan Portor Over a year ago

Although this is a splendid idea I would like to accomplish this task with mostly my own code, I am just trying to figure it out :) Thank you though

undetected Selenium Over a year ago

@FeitanPortor We are neither aware about your requirement nor about your usecase. You have raised your question and contributors are trying to help you out in their own capacity. Feel free to use either the code or the logic within :) it would be your choice

Feitan Portor Over a year ago

I know. This doesn't precisely answer my question. I upvoted earlier

codehacker · Accepted Answer · 2018-09-20 11:44:49Z

0

hi @Feitan Portor you have written the code absolutely perfect the only reason that you are redirected back to the first page is because you have given url = driver.current_url in the last for loop where it is the url that remains static and only the java script that instigates the next click event so just remove url = driver.current_url and driver.get(url)

and you are good to go i have tested my self also to get the current page that your scraper is in just add this part in the for loop so you will get to know where your scraper is :

ss = driver.find_element_by_class_name('rdpCurrentPage').text
    print(ss)

Hope this solves your confusion

answered Sep 20, 2018 at 11:44

codehacker

3202 gold badges3 silver badges18 bronze badges

1 Comment

Feitan Portor Over a year ago

I get an error pastebin.com/jZPCpdjB It manages to get to page 2, but no more

Collectives™ on Stack Overflow

Python - Selenium next page

3 Answers 3

9 Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related