1

I'm trying to scrape this website

https://maroof.sa/BusinessType/BusinessesByTypeList?bid=14&sortProperty=BestRating&DESC=True There is a button to load more content when you click it, it displays more content without changing the URL I had made a piece of code to load all the content first then extract all the URLs of the data I need then go to each link and scrape the data

url = "https://maroof.sa/BusinessType/BusinessesByTypeList?bid=26&sortProperty=BestRating&DESC=True"
driver = webdriver.Chrome()
driver.get(url)
# button = driver.find_element_by_xpath('//*[@id="loadMore"]/button')
num = 1
while num <= 507:
    sleep(4)
    button = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="loadMore"]/button')))
    button.click()
    print(num)
    num += 1
links = [l.get_attribute('href') for l in WebDriverWait(driver, 40).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="list"]/a')))]

it seems to work but sometimes it doesn't click on the button that loads the content it accidentally click on something else and makes an error and i have to start over again Can you help me?

3
  • If it throws an error, just use a try/except. Could just initialize a boolean and then loop until true (loop through the try except until it doesn't throw an error (and thereby trigger except) Commented Feb 22, 2020 at 21:33
  • Did you try using requests? Commented Feb 23, 2020 at 5:16
  • no, I didn't try to use requests but I will now Commented Feb 23, 2020 at 15:27

1 Answer 1

2

To scrape the website clicking on the button to load more content you need to induce WebDriverWait for the element_to_be_clickable() and you can use the following Locator Strategy:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://maroof.sa/BusinessType/BusinessesByTypeList?bid=26&sortProperty=BestRating&DESC=True')
while True:
    try:
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//button[@class='btn btn-primary']"))))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='btn btn-primary']"))).click()
    except TimeoutException:
    break
print([l.get_attribute('href') for l in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="list"]/a')))])
driver.quit()
Sign up to request clarification or add additional context in comments.

2 Comments

I will try this solution
@HosamGamal Great, let me know the status of your execution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.