Iterating over click while scraping data using selenium and python

Question

I am trying to scrape data from this webpage

http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting

I need to copy the contents from the table and put them in a csv file, then go the next page and append the contents of those page into the same file. I am able to scrape the table but however when I try to loop over clicking next button using selenium webdriver's click, it goes to the next page and stops. This is my code.

    driver = webdriver.Chrome(executable_path = 'path')
    url = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
def data_from_cricinfo(url):
    driver.get(url)
    pgsource = str(driver.page_source)
    soup = BeautifulSoup(pgsource, 'html5lib')
    data = soup.find_all('div', class_ = 'engineTable')
    for tr in data:
        info = tr.find_all('tr')
             # grab data

    next_link = driver.find_element_by_class_name('PaginationLink')
    next_link.click()
data_from_cricinfo(url)

Is there anyway to click next for all pages using a loop and copy the contents of all pages into the same file? Thanks in advance.

The program stops because you're not looping anywhere after you click on the next_link. Think where you can add a loop so that the part of code you want executes for all the pages. — Keyur Potdar
– Keyur Potdar, Commented Feb 14, 2018 at 12:32
Well I'm sorry I should have be more clear. That is where exactly I'm getting stuck as to how to add the loop after defining the function. I could loop at the page number in the url but I want to know if there is a method to loop the click function as that can be used in scenarios where the url remains same even after changing pages. — Jhonny
– Jhonny, Commented Feb 14, 2018 at 15:04

SIM · Accepted Answer · 2018-02-14 18:47:55Z

3

You can do something like below to traverse all the pages (through Next button) and parse the data from the table:

from selenium import webdriver
from bs4 import BeautifulSoup

URL = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'

driver = webdriver.Chrome()
driver.get(URL)

while True:
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    table = soup.find_all(class_='engineTable')[2]
    for info in table.find_all('tr'):
        data = [item.text for item in info.find_all("td")]
        print(data)

    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()

edited Feb 14, 2018 at 18:47

answered Feb 14, 2018 at 18:12

SIM

22.5k6 gold badges45 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Iterating over click while scraping data using selenium and python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related