1

I am trying to scrape data from this webpage

http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting

I need to copy the contents from the table and put them in a csv file, then go the next page and append the contents of those page into the same file. I am able to scrape the table but however when I try to loop over clicking next button using selenium webdriver's click, it goes to the next page and stops. This is my code.

    driver = webdriver.Chrome(executable_path = 'path')
    url = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
def data_from_cricinfo(url):
    driver.get(url)
    pgsource = str(driver.page_source)
    soup = BeautifulSoup(pgsource, 'html5lib')
    data = soup.find_all('div', class_ = 'engineTable')
    for tr in data:
        info = tr.find_all('tr')
             # grab data

    next_link = driver.find_element_by_class_name('PaginationLink')
    next_link.click()
data_from_cricinfo(url)

Is there anyway to click next for all pages using a loop and copy the contents of all pages into the same file? Thanks in advance.

2
  • The program stops because you're not looping anywhere after you click on the next_link. Think where you can add a loop so that the part of code you want executes for all the pages. Commented Feb 14, 2018 at 12:32
  • Well I'm sorry I should have be more clear. That is where exactly I'm getting stuck as to how to add the loop after defining the function. I could loop at the page number in the url but I want to know if there is a method to loop the click function as that can be used in scenarios where the url remains same even after changing pages. Commented Feb 14, 2018 at 15:04

1 Answer 1

3

You can do something like below to traverse all the pages (through Next button) and parse the data from the table:

from selenium import webdriver
from bs4 import BeautifulSoup

URL = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'

driver = webdriver.Chrome()
driver.get(URL)

while True:
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    table = soup.find_all(class_='engineTable')[2]
    for info in table.find_all('tr'):
        data = [item.text for item in info.find_all("td")]
        print(data)

    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.