Scraping javascript table with a scroll using selenium

Question

I am trying to scrape a table which is being generated through javascript but I am struggling. My code so far is:

driver = webdriver.Chrome();

driver.get("https://af.ktnlandscapes.com/")

# get table -- first wait for table to fully load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='list-view']/tbody/tr")))
table = driver.find_element_by_xpath("//*[@id='list-view']")

# get rows
rows = table.find_elements_by_xpath("tbody/tr")

# iterate rows and get cells
for row in rows:

    # get cells
    print (row.get_attribute("listing"))

I want to scrape the "listing=" numbers within the table. I am not sure how to extract the listing numbers and I am struggling to understand how to force the page to open the rest of the rows within the table as they only load when you scroll down the table a bit.

"There are 279 unique listings that match your search" maybe you can get this number? — Yun
– Yun, Commented Jan 21, 2020 at 10:28
This worked fantastically! My problem is now that some of the rows only load once you have scrolled the table a bit... — Jack
– Jack, Commented Jan 21, 2020 at 10:47
for scrolling you may have to search JavaScript code which can be used in driver.execute("javascript_code") — furas
– furas, Commented Jan 21, 2020 at 10:49

Yun · Accepted Answer · 2020-01-21 10:54:17Z

5

Try to use below code:

driver = webdriver.Chrome()
driver.get("https://af.ktnlandscapes.com/")

# get table -- first wait for table to fully load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='list-view']/tbody/tr")))
table = driver.find_element_by_xpath("//*[@id='list-view']")

get_number = 0
while True:
    count = get_number
    rows = table.find_elements_by_xpath("tbody/tr[@class='list-view-listing']")
    driver.execute_script("arguments[0].scrollIntoView();", rows[-1])  # scroll to last row
    get_number = len(rows)
    print(get_number)
    time.sleep(1)
    if get_number == count:
        break

Output:

It's actually 339 rows queried in web console.

answered Jan 21, 2020 at 10:54

Yun

1,0528 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

James · Accepted Answer · 2020-01-21 12:00:25Z

2

This is probably simpler to do using requests. If you inspect the page in Chrome/Firefox, as you scroll list area, it sends GET requests for more data. The endpoint is: /list-view-load.php?landscape_id=31&landscape_nid=33192&region=All&category=All&subcategory=All&search=&custom1=&custom2=&custom3=&custom4=&custom5=&offset=20, with the offset increasing by 20 for each request.

You can imitate this via:

import requests
from lxml import html

sess = requests.Session()
url = ('https://af.ktnlandscapes.com/sites/all/themes/landscape_tools/functions'
       '/list-view-load.php?landscape_id=31&landscape_nid=33192&region=All&'
       'category=All&subcategory=All&search=&custom1=&custom2=&custom3=&'
       'custom4=&custom5=&offset={offset}')

gets = []
for i in range(50):
    data = sess.get(url.format(offset=20*i)).json().get('data')
    if not data:
        break
    gets.append(data)
    print(f'\rfinished request {i}', end='')
else:
    print('There is more data!! Increase the range.')

listings = []
for g in gets:
    h = html.fromstring(g)
    listings.extend(h.xpath('tr/@listing'))

print('Number of listings:', len(listings))
# prints:
Number of listings: 339

listings
# returns
['91323', '91528', '91282', '91529', '91572', '91356', '91400', '91445',
 '91373', '91375', '91488', '91283', '91294', '91324', '91423', '91325',
 '91475', '91415', '91382', '91530', '91573', '91295', '91326', '91424',
 ...
 '91568', '91592', '91613', '91569', '91593', '91594', '91570', '91352',
 '91414', '91486', '91353', '91304', '91311', '91354', '91399', '91602',
 '91571', '91610', '103911']

edited Jan 21, 2020 at 12:00

answered Jan 21, 2020 at 10:56

James

37k4 gold badges54 silver badges79 bronze badges

4 Comments

Jack Over a year ago

I'm not sure this is quite right, shouldn't there only be 339 results in the list?

James Over a year ago

The output is abbreviated for space. That is what the ... represents.

Jack Over a year ago

Sorry I'm a little confused, could you clarify the get 'url' I should be using?

James Over a year ago

Sorry, I dropped it's assignment. Updated now

Collectives™ on Stack Overflow

Scraping javascript table with a scroll using selenium

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related