2

I was doing web scraping for a website with multiple pages in one web page. But when I click page 2, the url showed http://www.worldhospitaldirectory.com/Germany/hospitals#page-2.

And I put this url as next navigation location. And it goes directly to http://www.worldhospitaldirectory.com/Germany/hospitals#page-1, which is the default page.

I don't how to navigate to these sub pages. Any suggestions or code?

my code now:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get('http://www.worldhospitaldirectory.com/Germany/hospitals')
url = []
pagenbr = 1

while pagenbr <= 43:
   current = driver.current_url
   driver.get(current)
   lks = driver.find_elements_by_xpath('//*[@href]')
   for ii in lks:
       link = ii.get_attribute('href')
       if '/info' in link:
           url.extend(link)
           print (link)
   print('page ' + str(pagenbr) + ' is done.')
   elm = driver.find_element_by_link_text('Next')
   driver.implicitly_wait(10)
   elm.click()
   pagenbr += 1
2
  • 1
    Can you provide the code that you're using? Commented Feb 6, 2017 at 14:54
  • Sure. I will update my code there.@brittenb Commented Feb 6, 2017 at 15:03

3 Answers 3

2

Try just to click appropriate button on pagination as

driver.find_element_by_link_text('Next') # to get next page

or

driver.find_element_by_link_text('2') # to get second page
Sign up to request clarification or add additional context in comments.

9 Comments

I updated my code. It worked on iterate to a new page. But after I iterate to new one. My code cannot pull the links as it did first time. Please give me some advices
What you expect this line url.extend(link) to do? Do you mean url.append(link) ?
Yes. I forgot to change it to append.
Do you know what is wrong with my code. After it get the links from the first page, it cannot get other links from page 2, 3 and etc.
What is output of your code? do you get any exceptions, something about stale element?
|
1

Get element button

button_next = driver.find_element_by_xpath('//a[@class='page-link next'])
button_next.click()

I let the algorithm to iterate all pages for you

4 Comments

Thx, but after I iterate to each new page, i cannot make my loop to pull links from new page. Would you like to take a look? I will update my code now.
Probably you will have to sleep when click, because script execution is faster than the web load
yeah, but i put waiting time there to wait for fully loading.
No, driver.implicitly_wait(10) is NOT like sleep, is the MAX time to wait functions like find_element to find an element in the web.
0

This worked for me

while pagenbr <= 3:
    current = driver.current_url
    print current
    driver.get(current)
    lks = driver.find_elements_by_xpath('//*[@href]')
    for ii in lks:
        link = ii.get_attribute('href')
        if '/info' in link:
            url.extend(link)
            print (link)
    print('page ' + str(pagenbr) + ' is done.')
    elm = driver.find_element_by_link_text('Next')
    driver.implicitly_wait(10)
 
    elm.click()
    driver.implicitly_wait(10)
    lks = driver.find_elements_by_xpath('//*[@href]')
    for ii in lks:
        link = ii.get_attribute('href')
        if '/info' in link:
            url.extend(link)
            print (link)

 
    pagenbr += 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.