1

I've spent quite a bit of time on this and hoping to get some help...I'm new to Python and web scraping.

I'm accessing a website using credentials so I won't be able to share the link, but it's fairly straightforward and I have most of the code. Using Selenium, I'm able to access the website, input my credentials, access a table, pull in data I want, create a data frame, and go to the next page. But, I would like to automatically loop through all pages (with some pauses and being kind to the site) and append each page to a master. This is what I have so far:

driver = webdriver.Chrome()
driver.get('website')
username = driver.find_element_by_id("username")
password = driver.find_element_by_id("password")

username.send_keys("username")
password.send_keys("password"+"\n")

driver.implicitly_wait(20)

table = driver.find_element_by_id('preblockBody')

information = []
job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
for value in job_elems:
    #print(value.text)
    information.append(value.text)

nxt=driver.find_element_by_xpath("//a[contains(@href, 'gotoNextPage(2)')]")
driver.execute_script("arguments[0].click();", nxt)

I think the best route is finding all the contains 'gotoNextPage' references and create a loop, but I'm unsure how to do so. Any help is appreciated very much.

1 Answer 1

1

UPDATE 1:

I've found something helpful where I use 'Next' instead of clicking the specific 'gotoNextPage' element. Here is my new code, however, it only appends the last page of info rather than appending as it goes through the pages. This is very close!

driver = webdriver.Chrome()
driver.get('website')
username = driver.find_element_by_id("username")
password = driver.find_element_by_id("password")

username.send_keys("user name")
password.send_keys("password"+"\n")

while True:
    driver.implicitly_wait(30)
    table = driver.find_element_by_id('preblockBody')
    information = []
    job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
    for value in job_elems:
    #print(value.text)
        information.append(value.text)

    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()
print(information)
Sign up to request clarification or add additional context in comments.

1 Comment

I was able to figure this out by bringing my empty list out of the loop...simple but loops can be confusing to a newbie like me

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.