How can I infinite-scroll a web page using selenium webdriver in python

Question

I’m newbie of selenium, I’m trying to figure out how to scroll infinitely i tried almost everything what other stackoverflow said

1.

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
driver = webdriver.Chrome('chromedriver', options=chrome_options)
driver.set_window_size(1320, 550)

exchange_link = "https://icodrops.com/ico-stats/"
driver.get(exchange_link)
wait = WebDriverWait(driver, 10)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

from selenium.webdriver.common.keys import Keys
Number = wait.until(EC.presence_of_element_located((By.XPATH,'html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[4]/div[2]/div[1]/div[1]/div[1]')))
lastElement = Number.find_elements(By.XPATH,'div')[-1]
lastElement.send_keys(Keys.NULL)

Number = wait.until(EC.presence_of_element_located((By.XPATH,'html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[4]/div[2]/div[1]/div[1]/div[1]')))
lastElement = Number.find_elements(By.XPATH,'div')[-1]
lastElement.location_once_scrolled_into_view

etc

driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

idk somethingelse i can do i spend a lot of time to fix it

thanks for all reply. but nothing works i tried two things

while True:
        if j == 900:
            break

        try :
            ele = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//div[@id='market-ico-stat-container']/div)[{j}]")))
            driver.execute_script("arguments[0].scrollIntoView(true);", ele)
            ico_name = wait.until(EC.presence_of_element_located((By.XPATH,f'/html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[5]/div[2]/div[1]/div[1]/div[1]/div[{j}]/a[1]/div[1]/div[1]/div[2]/h3/a'))).get_attribute("textContent")
            print(j)
            print(ico_name)
            j+=1

        except :
            break

but result as same. from 51 it can’t crawl. so it means that no scroll down

cruisepandey · Accepted Answer · 2022-01-30 06:07:11Z

1

You should scroll each web element one by one with the help of execute_script

Code:

driver = webdriver.Chrome(driver_path)

driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://icodrops.com/ico-stats/")

j = 1
while True:
    ele = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//div[@id='market-ico-stat-container']/div)[{j}]")))
    driver.execute_script("arguments[0].scrollIntoView(true);", ele)
    time.sleep(0.5)
    name = ele.find_element(By.XPATH, ".//descendant::h3//a").get_attribute('innerText')
    print(name)
    j = j + 1

    #below code is just in case you want to break from infinite loop
    if j > 50:
        break

Output:

Ambire Wallet
Himo World
Highstreet
Decimated
Planet Sandbox
BENQI
DeHorizon
Mines Of Dalarnia
MonoX
Lobis
AntEx
Titan Hunters
Tempus
The Realm Defenders
Aurora
XDEFI Wallet
Libre DeFi
Genopets
Mytheria
ReSource
Defactor
PlaceWar
CryptoXpress
Cryowar
Numbers Protocol
Dragon Kart
Trusted Node
Cere Network
Elemon
Meta Spatial
YIN Finance
Ardana
CropBytes
Good Games Guild
Ariadne
ThorSwap
Solend
GooseFX
Galactic Arena
DotOracle
Scallop
AcknoLedger
Clearpool
Sandclock
ArtWallet
Aurory
BloXmove
WonderHero
Lazio Fan Token
Hero Arena

Process finished with exit code 0

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

the above code will never break and will be executing infinite, to overcome this behavior you should introduce a maximum limit like this:

if j == 500:
    break

However, the web application seems to detect the Selenium script.

edited Jan 30, 2022 at 6:07

answered Jan 29, 2022 at 15:27

cruisepandey

29.5k6 gold badges23 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mun_sunouk Over a year ago

thanks for quick reply. but i did exactly what you told me but it didn’t work out can you help me? i add your suggestion as my question

cruisepandey Over a year ago

I am running the same exact code and could get the above output.

cruisepandey Over a year ago

However you'd still need to find a way so your selenium script will not be detected by the web application.

Max Daroshchanka · Accepted Answer · 2022-01-31 07:27:18Z

0

I was able to scroll this with the next code changes:

Add extra options to make the script undetected (it was blocked as a bot before)
Add keyboard action ARROW_UP, this does magic and content started to load after js scroll.
Add 5 seconds timeout to load the new content

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
#extra options
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome('chromedriver', options=chrome_options)
driver.set_window_size(1320, 550)

exchange_link = "https://icodrops.com/ico-stats/"
driver.get(exchange_link)

SCROLL_PAUSE_TIME = 5 #5 seconds
time.sleep(SCROLL_PAUSE_TIME)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

for x in range(0, 10):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    driver.find_element(By.XPATH, "//body").send_keys(Keys.ARROW_UP)
    time.sleep(SCROLL_PAUSE_TIME)
    new_height = driver.execute_script("return document.body.scrollHeight")
    print('current Y: ' + str(new_height))
    if new_height == last_height:
        break
    last_height = new_height
driver.close()

Output:

current Y: 9792
current Y: 32542
current Y: 68942
current Y: 82592
current Y: 82592

I've tested this with Selenium 4, Chrome 97, Windows.

This code might be improved and optimized, but at least I hope it should work.

edited Jan 31, 2022 at 7:27

answered Jan 29, 2022 at 16:13

Max Daroshchanka

3,0082 gold badges13 silver badges18 bronze badges

6 Comments

Mun_sunouk Over a year ago

thanks for your help. but i just wondering it is different then i did before. because when i just do your code, it is exacatly same result as before. like one loop and new_height being equaled with last_height

Max Daroshchanka Over a year ago

@Mun_sunouk, I've found, it could exist after the first scroll because of 3 reasons. And it means that after scroll the new content is not loaded. Reason 1 - browser still is detected by bot protection (cloudfare). This can be confirmed if call driver.get(exchange_link) twice. After the second page load there will be redirection to error message page, that your session is detected and blocked.

Max Daroshchanka Over a year ago

Reason 2 - session is not blocked, but js scroll is not enough to apply content loading. I've checked that content started to load if in addition I've perform some action with the real mouse scroll or keyboard scroll. So here I've added keyboard press arrow up which somehow fires the content loading event.

Max Daroshchanka Over a year ago

Reason 3 - session is not blocked, content loading is applied, but content was not able to load in 0.5 seconds, so scroll-height stay the same and the loop exits. I've added 5 seconds to wait and it started to work. So I've added 3 changes to your implementation to make it work.

Max Daroshchanka Over a year ago

Unfortunately I'm not able to check this with python/linux for now to fully reproduce your env.. but anyway, theese are results of my research and I can confirm that 3 points I've mentioned metter.

|

Collectives™ on Stack Overflow

How can I infinite-scroll a web page using selenium webdriver in python

2 Answers 2

3 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related