I am new to web scraping; I am trying to scrape information water utilities from this site. I am currently able to successfully navigate through each region through a drop down, and access the first page. I am currently unable to successfully navigate to the next page for all pages before going to the next region. The page navigation bar is a list with no 'Next' button, and I currently try to iterate through the list using range. I don't get the correct range for the list when I get the len. As it stands, I am able to go to only the first page of each region. I am struggling to figure out what I am doing wrong or what to consider, even after trying to look for answers on similar questions. Any help to this end will be highly appreciated.
Thanks!
Here is my current code(I didn't scraping, focused on navigating pages):
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
url = 'https://database.ib-net.org/search_utilities?type=2'
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
print("Retriving the site...")
# All regions available
regions = ['Africa', 'East Asia and Pacific', 'Europe and Central Asia', 'Latin America (including USA and Canada)', 'Middle East and Northern Africa', 'South Asia']
for region in regions:
# Select all options from drop down menu
selectOption = Select(browser.find_element_by_id('MainContent_ddRegion'))
print("Now constructing output for: " + region)
# Select table and wait for data to populate
selectOption.select_by_visible_text(region)
time.sleep(4)
list_of_table_pages = browser.find_element_by_xpath('//*[@id="MainContent_gvUtilities"]/tbody/tr[52]/td/ul')
no_pages = len(list_of_table_pages.find_elements_by_xpath("//li"))
print(("No of table pages to be scraped are: %d") %no_pages)
print("Outputing data into "+ region +".csv...")
all_table_data = []
# starts the range count from 1 instead of 0
for page in range(1, no_pages):
try:
#Navigate to the next page once done
table_page = str(page)
WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="MainContent_gvUtilities"]/tbody/tr[52]/td/ul/li['+ table_page + ']/a'))).click()
print("Navigating to next table page...")
except (TimeoutException, WebDriverException):
print("Last page reached, moving to the next region...")
break
print("No more pages to scrape under %s. Moving to the next region..." %region)
browser.close()
browser.quit()