2

I’m newbie of selenium, I’m trying to figure out how to scroll infinitely i tried almost everything what other stackoverflow said

1.

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
driver = webdriver.Chrome('chromedriver', options=chrome_options)
driver.set_window_size(1320, 550)

exchange_link = "https://icodrops.com/ico-stats/"
driver.get(exchange_link)
wait = WebDriverWait(driver, 10)

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height
from selenium.webdriver.common.keys import Keys
Number = wait.until(EC.presence_of_element_located((By.XPATH,'html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[4]/div[2]/div[1]/div[1]/div[1]')))
lastElement = Number.find_elements(By.XPATH,'div')[-1]
lastElement.send_keys(Keys.NULL)
Number = wait.until(EC.presence_of_element_located((By.XPATH,'html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[4]/div[2]/div[1]/div[1]/div[1]')))
lastElement = Number.find_elements(By.XPATH,'div')[-1]
lastElement.location_once_scrolled_into_view

etc

driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

driver.execute_script("document.getElementById('mydiv').scrollIntoView();")

idk somethingelse i can do i spend a lot of time to fix it

thanks for all reply. but nothing works i tried two things

while True:
        if j == 900:
            break

        try :
            ele = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//div[@id='market-ico-stat-container']/div)[{j}]")))
            driver.execute_script("arguments[0].scrollIntoView(true);", ele)
            ico_name = wait.until(EC.presence_of_element_located((By.XPATH,f'/html[1]/body[1]/div[1]/div[1]/div[1]/main[1]/div[1]/div[5]/div[2]/div[1]/div[1]/div[1]/div[{j}]/a[1]/div[1]/div[1]/div[2]/h3/a'))).get_attribute("textContent")
            print(j)
            print(ico_name)
            j+=1

        except :
            break

but result as same. from 51 it can’t crawl. so it means that no scroll down

2 Answers 2

1

You should scroll each web element one by one with the help of execute_script

Code:

driver = webdriver.Chrome(driver_path)

driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://icodrops.com/ico-stats/")

j = 1
while True:
    ele = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//div[@id='market-ico-stat-container']/div)[{j}]")))
    driver.execute_script("arguments[0].scrollIntoView(true);", ele)
    time.sleep(0.5)
    name = ele.find_element(By.XPATH, ".//descendant::h3//a").get_attribute('innerText')
    print(name)
    j = j + 1

    #below code is just in case you want to break from infinite loop
    if j > 50:
        break

Output:

Ambire Wallet
Himo World
Highstreet
Decimated
Planet Sandbox
BENQI
DeHorizon
Mines Of Dalarnia
MonoX
Lobis
AntEx
Titan Hunters
Tempus
The Realm Defenders
Aurora
XDEFI Wallet
Libre DeFi
Genopets
Mytheria
ReSource
Defactor
PlaceWar
CryptoXpress
Cryowar
Numbers Protocol
Dragon Kart
Trusted Node
Cere Network
Elemon
Meta Spatial
YIN Finance
Ardana
CropBytes
Good Games Guild
Ariadne
ThorSwap
Solend
GooseFX
Galactic Arena
DotOracle
Scallop
AcknoLedger
Clearpool
Sandclock
ArtWallet
Aurory
BloXmove
WonderHero
Lazio Fan Token
Hero Arena

Process finished with exit code 0

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

the above code will never break and will be executing infinite, to overcome this behavior you should introduce a maximum limit like this:

if j == 500:
    break

However, the web application seems to detect the Selenium script.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks for quick reply. but i did exactly what you told me but it didn’t work out can you help me? i add your suggestion as my question
I am running the same exact code and could get the above output.
However you'd still need to find a way so your selenium script will not be detected by the web application.
0

I was able to scroll this with the next code changes:

  1. Add extra options to make the script undetected (it was blocked as a bot before)
  2. Add keyboard action ARROW_UP, this does magic and content started to load after js scroll.
  3. Add 5 seconds timeout to load the new content
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko")
#extra options
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome('chromedriver', options=chrome_options)
driver.set_window_size(1320, 550)

exchange_link = "https://icodrops.com/ico-stats/"
driver.get(exchange_link)

SCROLL_PAUSE_TIME = 5 #5 seconds
time.sleep(SCROLL_PAUSE_TIME)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

for x in range(0, 10):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    driver.find_element(By.XPATH, "//body").send_keys(Keys.ARROW_UP)
    time.sleep(SCROLL_PAUSE_TIME)
    new_height = driver.execute_script("return document.body.scrollHeight")
    print('current Y: ' + str(new_height))
    if new_height == last_height:
        break
    last_height = new_height
driver.close()

Output:

current Y: 9792
current Y: 32542
current Y: 68942
current Y: 82592
current Y: 82592

I've tested this with Selenium 4, Chrome 97, Windows.

This code might be improved and optimized, but at least I hope it should work.

6 Comments

thanks for your help. but i just wondering it is different then i did before. because when i just do your code, it is exacatly same result as before. like one loop and new_height being equaled with last_height
@Mun_sunouk, I've found, it could exist after the first scroll because of 3 reasons. And it means that after scroll the new content is not loaded. Reason 1 - browser still is detected by bot protection (cloudfare). This can be confirmed if call driver.get(exchange_link) twice. After the second page load there will be redirection to error message page, that your session is detected and blocked.
Reason 2 - session is not blocked, but js scroll is not enough to apply content loading. I've checked that content started to load if in addition I've perform some action with the real mouse scroll or keyboard scroll. So here I've added keyboard press arrow up which somehow fires the content loading event.
Reason 3 - session is not blocked, content loading is applied, but content was not able to load in 0.5 seconds, so scroll-height stay the same and the loop exits. I've added 5 seconds to wait and it started to work. So I've added 3 changes to your implementation to make it work.
Unfortunately I'm not able to check this with python/linux for now to fully reproduce your env.. but anyway, theese are results of my research and I can confirm that 3 points I've mentioned metter.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.