Loading all elements using Selenium/Python- Load more

Question

I am trying to scrape all of the perfumes which are located at https://www.fragrantica.com/search/

There are almost 73,367 perfumes on the site and I want to load all of them. The problem is the site shows 30 perfumes and then you need to click 'show more results' which will show additional 30 perfumes and so on.. so basically we must press the show more button almost 2444 times to reach the end of the page and have all perfumes loaded. This is my code so far:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import time


options = Options()
options.add_argument("--profile-directory=Default")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("start-maximized")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36")


driver = webdriver.Chrome(service = Service(executable_path='C:/Users/armon/Downloads/chromedriver_win32/chromedriver.exe'), options=options)
url = 'https://www.fragrantica.com/'
driver.get(url)

time.sleep(3)

perfumes_btn = driver.find_element(by=By.XPATH, value = '//*[@id="offCanvasLeft"]/ul/li[5]/a')
perfumes_btn.click()

search_btn = WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,'/html/body/div[2]/div[2]/div[2]/ul/li[5]/ul/li[1]/a')))
search_btn.click()


load_more_btn = '//*[@id="main-content"]/div[1]/div[1]/div/div/div/div[2]/div[1]/div/div[3]/div/div/div/div/div/button'
loadingButton = WebDriverWait(driver,70).until(EC.element_to_be_clickable((By.XPATH,load_more_btn)))

maxPerfumes = 73346;
i = 0

while loadingButton:
    driver.execute_script("arguments[0].click();", loadingButton)
    i = i+1
    print(i)    
    loadingButton = WebDriverWait(driver,70).until(EC.element_to_be_clickable((By.XPATH,load_more_btn)))
    loadElems = driver.find_elements(by = By.XPATH, value = '//*[@id="main-content"]/div[1]/div[1]/div/div/div/div[2]/div[1]/div/div[3]/div/div/div/span/div[1]')
    if len(loadElems)>0:
        loadingButton = WebDriverWait(driver,70).until(EC.element_to_be_clickable((By.XPATH,load_more_btn)))
        PerfumesLoaded = len(driver.find_elements(by = By.XPATH, value = '//*[@id="main-content"]/div[1]/div[1]/div/div/div/div[2]/div[1]/div/div[3]/div/div/div/span/div[1]'))
    else:
        print("Loaded all the tires")
        break
    if PerfumesLoaded >= maxPerfumes:
        print (PerfumesLoaded + " are loaded successfully.")
        break

The problem is each time I run it, it only makes maximum of 34 loops and then stops and throws and error:

TimeoutException                          Traceback (most recent call last)
c:\Users\armon\OneDrive\Desktop\OLD\Data Analytics\Portfolio Projects\Jolse Project\Jolse Scraping Notebook.ipynb Cell 1' in <cell line: 36>()
     41 loadElems = driver.find_elements(by = By.XPATH, value = '//*[@id="main-content"]/div[1]/div[1]/div/div/div/div[2]/div[1]/div/div[3]/div/div/div/span/div[1]')
     42 if len(loadElems)>0:
---> 43     loadingButton = WebDriverWait(driver,70).until(EC.element_to_be_clickable((By.XPATH,load_more_btn)))
     44     PerfumesLoaded = len(driver.find_elements(by = By.XPATH, value = '//*[@id="main-content"]/div[1]/div[1]/div/div/div/div[2]/div[1]/div/div[3]/div/div/div/span/div[1]'))
     45 else:

File c:\Users\armon\anaconda3\envs\armoniaenv\lib\site-packages\selenium\webdriver\support\wait.py:87, in WebDriverWait.until(self, method, message)
     85     if time.monotonic() > end_time:
     86         break
---> 87 raise TimeoutException(message, screen, stacktrace)

TimeoutException: Message:

The print(i) that I have in the loop is simply to know how many loops it made.. any suggestions to how tweak it? or if I am doing something wrong?

My goal is to click load more until I can't no more, and then I can access all the perfumes that are on the page. TIA

Hey @Armonia did the tips in my answer help you?

Wonka
– Wonka

2022-07-01 12:00:22 +00:00
Commented Jul 1, 2022 at 12:00 — Wonka
– Wonka, Commented Jul 1, 2022 at 12:00

Wonka · Accepted Answer · 2022-07-01 12:57:07Z

1

I solve it using try-except when finding elements. You can also change a bit your code to get results only one time after the while loop ends

try:
    loadingButton = WebDriverWait(driver,70).until(EC.element_to_be_clickable((By.XPATH,load_more_btn)))

except Exception as e:
     print("cant locate element: %s" %e)
     loadingButton = False

#My code to solve this

more = True 
while more:
        
    more = my_custom_function("//XX")
    #my_custom_function try to get element by Xpath
    #and also click it, on exception it will return False
    
    if more:
        #Need sleep cause website change when click load more
        # python code can run faster than website loads, can do sleep inside my_custom_function also
        sleep(1)

edited Jul 1, 2022 at 12:57

answered Jul 1, 2022 at 10:35

Wonka

1,9012 gold badges15 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Armonia Over a year ago

I added the sleep command but still no improvement :/

Wonka Over a year ago

Are you using the try-except to avoid crash when no element?

Armonia Over a year ago

yes I do use it, it only made one loop and stopped

Wonka Over a year ago

And is should be more "Load more" buttons right? try to add more sleep if you see slow website load, or check what error you get on exception , i will edit this answer, check except to get error if you dont know how to do it)

Armonia Over a year ago

it always gives me a timeout error even with adding sufficient sleep. weird thing is why does it always stop at loop number 34 lol... when i check the site to see what's the issue the load more button looks faded and when i hover above it , it shows a 🚫 sign. no clue why

|

Collectives™ on Stack Overflow

Loading all elements using Selenium/Python- Load more

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related