Navigation using selenium in python

Question

I'm scraping this website using Python and Selenium. But it currently only scrapes the first 10 page for the month of July, it turns the page number of the previous sibling of the next button into int and clicks next number_of_pages - 1 however after it gets to page 10 it stops.

URL - https://planning.adur-worthing.gov.uk/online-applications/search.do?action=monthlyList

Can anyone help me to get it to scrape all the pages?

def pagination( driver ):
   data = []
   last_element = driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]/preceding-sibling::a[1]')
   if last_element is None:
    number_of_pages = 1
else:
    number_of_pages = int( last_element.text )
# data = [ getData( driver ) ]
data.extend(getData(driver))
for i in range(number_of_pages - 1):
    driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
    data.extend( getData( driver ) )
    time.sleep(1)
return data

can you print number_of_pages before the for loop? I suspect that because you convert the text of the last element to int, it just shows 10 (even though there are more pages) — NotSoShabby
– NotSoShabby, Commented Aug 23, 2018 at 14:11
I just tested this out your right it only turns 10 into int it doesnt carry on for the other pages — Abdul Jamac
– Abdul Jamac, Commented Aug 23, 2018 at 14:13
as per your given link [URL - planning.adur-worthing.gov.uk/online-applications/… . I am seeing only 10 pages. — Vardhman Patil
– Vardhman Patil, Commented Aug 23, 2018 at 14:18
are you checking the month july if you are press page 10 and more should come up — Abdul Jamac
– Abdul Jamac, Commented Aug 23, 2018 at 14:23

NotSoShabby · Accepted Answer · 2018-08-23 14:16:57Z

1

number_of_pages seems to have the value of 10.

Find another way to find out how many pages there are.

You can use a while loop that checks if the "next page" button is available, and if it is, keep going, else- that is the last page.

like this:

while next_button_element.is_displayed():
    // Do the action that is currently in the for loop

answered Aug 23, 2018 at 14:16

NotSoShabby

3,7789 gold badges39 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Abdul Jamac Over a year ago

Do you mean like this: next_button_element = driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]') while next_button_element.is_displayed(): driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( class ), " "), " next ") ]').click() data.extend( getData( driver ) ) time.sleep(1) return data

Sers Over a year ago

Use more simple selectors: next button css selector driver.find_elements_by_css_selector('a.next')

NotSoShabby Over a year ago

No need to find the element twice. find it once and store it in a variable and then use is_displayed() or click() function on it

Abdul Jamac Over a year ago

This doesnt work either next_button_element = driver.find_elements_by_css_selector('a.next') while next_button_element.is_displayed(): next_button_element.click() data.extend( getData( driver ) ) time.sleep(1) return data

Sers · Accepted Answer · 2018-08-23 17:36:32Z

1

Code you can use:

while True:
    data.extend(getData(driver))
    try:
        driver.find_element_by_css_selector('a.next').click()
    except:
        break

edited Aug 23, 2018 at 17:36

answered Aug 23, 2018 at 14:48

Sers

12.3k2 gold badges14 silver badges33 bronze badges

6 Comments

Abdul Jamac Over a year ago

got this error next_button_by = (By.CSS_SELECTOR, "a.next") NameError: global name 'By' is not defined

Sers Over a year ago

add from selenium.webdriver.common.by import By

Abdul Jamac Over a year ago

if driver.find_elements(next_button_by)==0: this line gave the error: WebDriverException: Message: invalid argument: 'using' must be a string

Sers Over a year ago

Missed get count .count or use len(driver.find_elements(next_button_by))

Abdul Jamac Over a year ago

it works thank you however how do i get it to stop printing this error when a.next doesn't exists NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"a.nex t"}

|

Shivam Mishra · Accepted Answer · 2018-08-23 17:16:47Z

0

Look, I understand you took the idea of calculating the total number of pages from my answer for a previous question of yours. In the previous case since the last page number was directly available to us, it worked but that's not the case here.

Solution :

Although the number of pages is not directly available but the total number of entries is -

Now, as you can see in the above screenshot for the case of July this number is 174. Assuming you put the pagination length(the number of entries in a single page) as default 10, the number of pages should be 18 (17 pages of 10 entries each and one extra page for remaining 4 entries).

So, the logic of calculating the number of pages should be simple. If you somehow got this total number of entries in total_entries variable, the number of pages should be(taken from this:

number_of_pages = (total_entries/10) + 1

Python by default returns the lower bound integer by division operator so 174/10 will return 17 and adding +1 will return 18. So there you have it- 18 as the number of pages.

Now, to extract the total number of entries. You use the below locator to find the <span> element holding that.

driver.find_element_by_xpath('//span[@class='showing']')

But this element contains text like this - Showing 1-10 of 174. You need only the 174 part from the entire string. To do that, first you extract the string after "of" and then convert it into int.

Algorithm to extract the total number of entries as int from the text:

showing_text = driver.find_element_by_xpath("//span[@class='showing']").text    #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1]        # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])  #174 as int
number_of_pages = (number_of_entries/10) + 1   #18

Final code:

def pagination( driver ):
   data = []
   last_element = driver.find_element_by_xpath("//span[@class='showing']")
   if last_element is None:
      number_of_pages = 1
   else:
      showing_text = driver.find_element_by_xpath("//span[@class='showing']").text              number_of_entries_text = showing_text.split("of",1)[1]        
      number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])  
      number_of_pages = (number_of_entries/10) +1   

   for i in range(number_of_pages - 1):
       driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
       time.sleep(1)

Note:

I think my solution is better since you don't have to repeatedly check for any element to be available or to catch any exceptions. You just directly get the number of pages and you click the next button that many times.

edited Aug 23, 2018 at 17:16

answered Aug 23, 2018 at 16:15

Shivam Mishra

1,4492 gold badges12 silver badges30 bronze badges

3 Comments

Abdul Jamac Over a year ago

math.cecil rounds it down to the smallest integer so that means it would skip page 18

Abdul Jamac Over a year ago

if there is a way to get it to also go to page 18 that would be great

Shivam Mishra Over a year ago

@AbdulJamac I am sorry I made it more complicated than it was necessary. Python by default returns lower bound int on division operator so there is no need of math.ceil. Check my edited answer. Just dividing the total number of entries by 10 and adding 1 to that will do the trick. And yes that way, it will go all the way to the end i.e. 18.

Collectives™ on Stack Overflow

Navigation using selenium in python

3 Answers 3

4 Comments

6 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related