I am trying to scrape a webpage and the links within that webpage. The webpage is: https://webgate.ec.europa.eu/rasff-window/screen/list . If you notice there are about 6000+ notifications and these notifications have separate links associated with them. I want to store all the links in a list. I am doing this using this code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from webdriver_manager.chrome import ChromeDriverManager
d = webdriver.Chrome(ChromeDriverManager().install())
#trying this scraping for multiple pages
links = []
i = 1
elems = d.find_elements_by_xpath("//a[@href]")
for elem in elems:
link_list = elem.get_attribute("href")
links.append(link_list)
while True:
print("This is the now the {} page".format(i))
i +=1
time.sleep(1)
try:
time.sleep(0.5)
WebDriverWait(d, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[@aria-label='Next page']"))).click()
print("we have clicked it once")
time.sleep(0.9)
elems2 = d.find_elements_by_xpath("//a[@href]")
for elem2 in elems2:
link_list = elem2.get_attribute("href")
links.append(link_list)
print("The button is clickable")
time.sleep(1)
except:
print("The button is now not clickable, we have collected all the links")
break
The idea is to use selenium to first find all the href links from that page and click on the next page button and do the same, which my While loop does. But as I run this code it does not complete the entire loop. For ex: If there are about 6400 notifications I expect it to run till the 64th page, but it stops in the middle suggesting that the next button is not clickable (except condition) though the button in reality is clickable. This happens on random pages, I have tried changing the time.sleep as well. Is there something wrong that I am doing?
ActionChain). You should observe when you get error and see what you have in browser in this moment.except:- you should rather useexcept Exception as ex: print(ex)to see what is really the problem.d.get(url)