Issue while web scraping with Selenium and Python

Question

I'm trying to scrape this website

https://maroof.sa/BusinessType/BusinessesByTypeList?bid=14&sortProperty=BestRating&DESC=True There is a button to load more content when you click it, it displays more content without changing the URL I had made a piece of code to load all the content first then extract all the URLs of the data I need then go to each link and scrape the data

url = "https://maroof.sa/BusinessType/BusinessesByTypeList?bid=26&sortProperty=BestRating&DESC=True"
driver = webdriver.Chrome()
driver.get(url)
# button = driver.find_element_by_xpath('//*[@id="loadMore"]/button')
num = 1
while num <= 507:
    sleep(4)
    button = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="loadMore"]/button')))
    button.click()
    print(num)
    num += 1
links = [l.get_attribute('href') for l in WebDriverWait(driver, 40).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="list"]/a')))]

it seems to work but sometimes it doesn't click on the button that loads the content it accidentally click on something else and makes an error and i have to start over again Can you help me?

If it throws an error, just use a try/except. Could just initialize a boolean and then loop until true (loop through the try except until it doesn't throw an error (and thereby trigger except) — Pete Marise
– Pete Marise, Commented Feb 22, 2020 at 21:33

undetected Selenium · Accepted Answer · 2020-02-22 22:48:21Z

2

To scrape the website clicking on the button to load more content you need to induce WebDriverWait for the element_to_be_clickable() and you can use the following Locator Strategy:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://maroof.sa/BusinessType/BusinessesByTypeList?bid=26&sortProperty=BestRating&DESC=True')
while True:
    try:
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//button[@class='btn btn-primary']"))))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='btn btn-primary']"))).click()
    except TimeoutException:
    break
print([l.get_attribute('href') for l in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="list"]/a')))])
driver.quit()

answered Feb 22, 2020 at 22:48

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Hosam Gamal Over a year ago

I will try this solution

undetected Selenium Over a year ago

@HosamGamal Great, let me know the status of your execution.

Collectives™ on Stack Overflow

Issue while web scraping with Selenium and Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related