29

I am trying to scroll to the end of a page so that I can make all the data visible and extract it. I tried to find a command for it but it's available in java (driver.executeScript) but couldn't find for python. Right now I am making the computer press the end key thousand times:

while i<1000:
    scroll = driver.find_element_by_tag_name('body').send_keys(Keys.END)
    i+=1

And I also tried driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") but it scrolls to the end of the loaded page and the same thing END key does. Once at the bottom of the page, next content loads. But now it doesn't scroll again.

I know there will be a very nice alternative for this.

How do I scroll to the end of the page using selenium in Python?

5
  • 1
    See if this helps : http://stackoverflow.com/a/27760083/4193730 Commented Sep 4, 2015 at 6:25
  • possible duplicate of How can I scroll a web page using selenium webdriver in python? Commented Sep 4, 2015 at 6:30
  • No this doesn't work because driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") scrolls to the end of the loaded page and the same thing END key does. Once at the bottom of the page, next content loads. But now it doesn't scroll. Commented Sep 4, 2015 at 6:33
  • Is that page lazyloading content? Do you page down, it loads another chunk of content, page down, repeat? Or is it just a really long page? CTRL+END should jump to the very end of the page in one shot. Commented Sep 4, 2015 at 18:36
  • No CTRL + END does the same thing as END Commented Sep 4, 2015 at 20:02

7 Answers 7

32

Well I finally figured out a solution:

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    match=False
        while(match==False):
                lastCount = lenOfPage
                time.sleep(3)
                lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
                if lastCount==lenOfPage:
                    match=True
Sign up to request clarification or add additional context in comments.

9 Comments

It's quite slow though, isn't it possible to speed it up somehow?
@SebastianNielsen Probably late but just adjust time.sleep() fast as possible but without being too fast to make the browser or the site think you are a bot. 0.5 sec seems to work well.
@user3326078 Not very practical because of the probability of varying internet speed. The lowest possible sleep timer is depended on the internet speed. It would be awesome if I could figure out a solution that didn't depend on sleep, something along the lines await for the page to load and then scroll again.
@SebastianNielsen Yea I agree wish there was a more robust/dynamic solution :/
I just got an idea. What if you scroll to the bottom of the page and waits for the DOM's height to increase; we know that when it updates it must mean that the site has loaded more content, and we are therefore not at the bottom anymore - this will be looped until the website takes more than x seconds to increase in height when we reach the bottom.
|
23

This can be done in one line by scrolling to document.body.scrollHeight

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

2 Comments

This would not work on pages such as Facebook, that is constantly updating the DOC's height when you reach the bottom of the page.
Fair point @SebastianNielsen, the solution is to use a while loop and break when the document height stops changing
17

None of these were working for me, but the below solution did:

driver.get("https://www.youtube.com/user/teachingmensfashion/videos")


def scroll_to_bottom(driver):

    old_position = 0
    new_position = None

    while new_position != old_position:
        # Get old scroll position
        old_position = driver.execute_script(
                ("return (window.pageYOffset !== undefined) ?"
                 " window.pageYOffset : (document.documentElement ||"
                 " document.body.parentNode || document.body);"))
        # Sleep and Scroll
        time.sleep(1)
        driver.execute_script((
                "var scrollingElement = (document.scrollingElement ||"
                " document.body);scrollingElement.scrollTop ="
                " scrollingElement.scrollHeight;"))
        # Get new position
        new_position = driver.execute_script(
                ("return (window.pageYOffset !== undefined) ?"
                 " window.pageYOffset : (document.documentElement ||"
                 " document.body.parentNode || document.body);"))

scroll_to_bottom(driver)

1 Comment

worked perfectly for me
6

You can utilize scrollingElement with scrollTop and scrollHeight to to scroll to the end of a page.

driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

References :

  1. Scroll Automatically to the Bottom of the Page
  2. Document.scrollingElement - Web APIs | MDN
  3. Element.scrollHeight - Web APIs | MDN
  4. Element.scrollTop - Web APIs | MDN

Comments

0

Since there is no link provided for the website I am going to assume that there is some kind of See More/Load More clickable element present on the page. Here is what I like to and its pretty simple.

count=10000
while count>1:
   try:
       button=driver.find_element_by_xpath('//*[@id="load_more"]')
       button.click()
       count-=1
       time.sleep(2)
   except StaleElementReferenceException:
       button=driver.find_element_by_xpath('//*[@id="load_more"]')
       button.click()
       time.sleep(2)

Comments

0
#go to a the element that actually scrolls like a tbody
element_in_table = self.driver.find_element(By.XPATH, html_tbody_path)
             
ActionChains(self.driver).move_to_element(element_in_table).perform()
element_in_table.click()
self.driver.execute_script("window.scrollTo(0, 
document.body.scrollHeight);var 
lenOfPage=document.body.scrollHeight;return lenOfPage;")

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

To add some more context, in some cases you'd neither know the element you're waiting to load, nor you'd know the number of iterations (scrolls) needed to reach the bottom of the page. What I found out to be a useful case in such scenarios is just take the entire page source, and while it changes on every scroll, you'd get some difference every time, regardless on its contents. Here's my suggestion:

html = driver.find_element(By.TAG_NAME, 'html')
while True:
    s1 = driver.page_source
    time.sleep(random.randint(1, 3))
    html.send_keys(Keys.PAGE_DOWN)
    html.send_keys(Keys.END)
    time.sleep(random.randint(1, 3))
    html.send_keys(Keys.END)
    s2 = driver.page_source
    if s1==s2:
        break

Eventually, I'd delete s1 and s2 - large unneeded chunks of text.

1 Comment

simplified:driver.find_element(By.XPATH, "//html").send_keys(Keys.END)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.