Selenium: scroll down of page and parse with python

Question

I try to parse page ozon.ru

And I have some problem. I should scroll the page and next get all html code. But I scroll page, the height is changing, but results of parsing is wrong, because it returns result only from first page. I can't understand, I should update html code of page and how can I do that?

def get_link_product_ozon(url):
    chromedriver = "chromedriver"
    os.environ["webdriver.chrome.driver"] = chromedriver
    driver = webdriver.Chrome(chromedriver)
    driver.get(url)
    i = 0
    last_height = driver.execute_script("return document.body.scrollHeight")
    while i < 80:
        try:
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(3)
            new_height = driver.execute_script("return document.body.scrollHeight")
            i += 1
            last_height = new_height
        except:
            time.sleep(3)
            continue
    soup = BeautifulSoup(driver.page_source, "lxml")
    all_links = soup.findAll('div', class_='bOneTile inline jsUpdateLink mRuble ')
    for link in all_links:
        print(link.attrs['data-href'])

    driver.close()

CtheSky · Accepted Answer · 2017-10-12 13:52:07Z

1

Those divs loaded after scrolling don't have class mRuble and you are doing exact string matching. Maybe try something like:

all_links = soup.select('div.bOneTile.inline.jsUpdateLink')
all_links = soup.select('div[data-href]')
...

answered Oct 12, 2017 at 13:52

CtheSky

2,63416 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Selenium: scroll down of page and parse with python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related