2

Hello everyone i'm trying to use selenium and scrapy to scraping some information from https://answers.yahoo.com/dir/index/discover?sid=396545663

I try different method, i use Selenium and setting PhantomJs like driver. For scrolling down the page, it's a infinite scroll page, i use this instruction:

elem.send_keys(Keys.PAGE_DOWN)

For simulating the press of Page Down button, instead of the JavaScript function:

browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Because this one "seems" load less elements in the page.

The main problem is how i can know when i have reached the bottom of the page? Is "Infinite Scroll" page so i can't know when it end i need to scroll down, but i don't have any element in the bottom to analyze.

Actually i use temporized cycle, but look really stupid.

Thanks

2 Answers 2

3

I would actually look for that "Loading..." indicator. Wait for it to be visible on every scroll, but if you'll get a TimeoutException - there was no loading indicator this time and there are no more items to load.

Sample implementation:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

while True:
    # do the scrolling
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    try:
        wait.until(EC.visibility_of_element_located((By.XPATH, "//*[. = 'Loading...']")))
    except TimeoutException:
        break  # not more posts were loaded - exit the loop

Not tested.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your respose, however Yahoo dosn't have this kind of icon or any indicator of loading.
@RedVelvet it has, at the bottom when you scroll, look for the appearing "Loading ..." element, it has id="ya-infinite-scroll-message" and "Loading ..." text.
thanks @alecxe i use wait.until(EC.visibility_of_element_located((By.ID, "ya-infinite-scroll-message"))) and it works, but he stop after 80 questions... it's strange.
EDIT: It's a great solutions, the fault is of the website seems that load different number of elements any time so change every times.
@RedVelvet yeah, I am afraid that waiting for the "loading" indicator might not be reliable enough. I've seen your follow-up question and will take a look if I'll find time. Thanks.
|
0

As example you can create some parallel thread witch will check page for ajax requests. If time between requests is more often than 10 seconds -- you on the end of page. Have no other idea.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.