Trying to get Selenium to scrape table and click button

Question

I hacked together the code below to try to scrape data from an HTML table, to a data frame, and then click a button to move to the next page, but it's giving me an error tat says 'invalid selector'.

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup
import time
from time import sleep
import pandas as pd


browser = webdriver.Chrome("C:/Utility/chromedriver.exe")
wait = WebDriverWait(browser, 10)

url = 'https://healthdata.gov/dataset/Hospital-Detail-Map/tagw-nk32'
browser.get(url)

for x in range(1, 5950, 13):
    time.sleep(3) # wait page open complete
    
    df = pd.read_html(browser.find_element_by_xpath("socrata-table frozen-columns").get_attribute('outerHTML'))[0]
    
    submit_button = browser.find_elements_by_xpath('pager-button-next')[0]
    submit_button.click()

I see the table, but I can't reference it.

Any idea what's wrong here?

I don't think that XPath selector is correct. Wouldn't it be something like //div[@class='socrata-table frozen-columns']? — Paul M.
– Paul M., Commented Dec 29, 2021 at 18:46

Yevhen Bondar · Accepted Answer · 2021-12-29 18:59:49Z

1

I've managed to find button with find_elements_by_css_selector

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup
import time
from time import sleep
import pandas as pd

browser = webdriver.Chrome("C:/Utility/chromedriver.exe")
wait = WebDriverWait(browser, 10)

url = 'https://healthdata.gov/dataset/Hospital-Detail-Map/tagw-nk32'
browser.get(url)

for x in range(1, 5950, 13):
    time.sleep(3)  # wait page open complete

    df = pd.read_html(
        browser.find_element_by_xpath("socrata-table frozen-columns").get_attribute(
            'outerHTML'))[0]

    submit_button = browser.find_elements_by_css_selector('button.pager-button-next')[1]
    submit_button.click()

Sometimes pagination hangs, and submit_button.click() ends with an error

selenium.common.exceptions.ElementClickInterceptedException: 
Message: element click intercepted: 
Element <button class="pager-button-next">...</button> 
is not clickable at point (182, 637). 
Other element would receive the click: <span class="site-name">...</span>

So consider to increase timeout. For example, you can use this approach


def click_timeout(element, timeout: int = 60):
    for i in range(timeout):
        time.sleep(1)
        try:
            element.click()
        except WebDriverException:
            pass
    element.click()

So, you click an element as fast as it will be ready

edited Dec 29, 2021 at 18:59

answered Dec 29, 2021 at 18:52

Yevhen Bondar

4,7471 gold badge15 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ASH Over a year ago

I used Paul's recommendation and Eugenij's recommendation, and came up with this. df = pd.read_html( browser.find_element_by_xpath("//div[@class='socrata-table frozen-columns']").get_attribute( 'outerHTML'))[0] submit_button = browser.find_elements_by_css_selector('button.pager-button-next')[1] submit_button.click() However, after running through the loop multiple times, my data frame has a shape of (0, 64), so nothing is being scraped. Thoughts?

Collectives™ on Stack Overflow

Trying to get Selenium to scrape table and click button

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related