0

I'm just started using selenium to scrape the table from webpage. So, I implemented the navigation of webpage using selenium. But, the the result keep looping when I run the code. Pretty sure that I wrote the code wrong. What should I fix the code so the navigation selenium works?

import requests
    import csv
    from bs4 import BeautifulSoup as bs
    from selenium import webdriver

browser=webdriver.Chrome()
browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')

# url = requests.get("https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet/")
soup=bs(browser.page_source)

filename = "C:/Users/User/Desktop/test.csv"
csv_writer = csv.writer(open(filename, 'w'))

pages_remaining = True

while pages_remaining:
    for tr in soup.find_all("tr"):
        data = []
        # for headers ( entered only once - the first time - )
        for th in tr.find_all("th"):
            data.append(th.text)
        if data:
            print("Inserting headers : {}".format(','.join(data)))
            csv_writer.writerow(data)
            continue

        for td in tr.find_all("td"):
            if td.a:
                data.append(td.a.text.strip())
            else:
                data.append(td.text.strip())
        if data:
            print("Inserting data: {}".format(','.join(data)))
            csv_writer.writerow(data)

try:
    #Checks if there are more pages with links
    next_link = driver.find_element_by_xpath('//*[@id="content"]/div[3]/table/tbody/tr/td[2]/table/tbody/tr/td[6]/a ]')
    next_link.click()
    time.sleep(30)
except NoSuchElementException:
    rows_remaining = False

1 Answer 1

1

Check if there any next button present on the page then click else exit from while loop.

if len(browser.find_elements_by_xpath("//a[contains(.,'Next')]"))>0:
      browser.find_element_by_xpath("//a[contains(.,'Next')]").click()
else:
      break

No need to use time.sleep() instead use WebDriverWait()


Code:

import csv
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

browser=webdriver.Chrome()
browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')
WebDriverWait(browser, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.postlisting")))
soup=bs(browser.page_source)

filename = "C:/Users/User/Desktop/test.csv"
csv_writer = csv.writer(open(filename, 'w'))

pages_remaining = True

while pages_remaining:
    WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.postlisting")))
    for tr in soup.find_all("tr"):
        data = []
        # for headers ( entered only once - the first time - )
        for th in tr.find_all("th"):
            data.append(th.text)
        if data:
            print("Inserting headers : {}".format(','.join(data)))
            csv_writer.writerow(data)
            continue

        for td in tr.find_all("td"):
            if td.a:
                data.append(td.a.text.strip())
            else:
                data.append(td.text.strip())
        if data:
            print("Inserting data: {}".format(','.join(data)))
            csv_writer.writerow(data)


    if len(browser.find_elements_by_xpath("//a[contains(.,'Next')]"))>0:
        browser.find_element_by_xpath("//a[contains(.,'Next')]").click()
    else:
        break
Sign up to request clarification or add additional context in comments.

5 Comments

It didnt work. I tried to copy xpath for 'Next' button and the result is //*[@id="content"]/div[3]/table/tbody/tr/td[2]/table/tbody/tr/td[4]/a .So, I substitute the "//a[contains(.,'Next')]" with it. and it didnt work as well. How should I change the xpath so the navigation selenium works?
Please just copy the entire code.I have tested it clicks on next button.You have only 2 pages of data right.
Do you know why it keep looping scrape page 1? and didnt scraped the other pages?
You said before It is working as expected.However it is clicking on each next button found on webpage.I haven't check your data.
I just checked the data. However, thanks to you the pagination works, it just that the selenium keeps scraping the same page even after it navigated to the other page

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.