I think it took a long time to load the table.
Because Selenium is a dynamic web page automation framework, it can address this problem.
I'll tell you my know-how.
time.sleep()
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
driver.set_window_size(1024, fn('Height'))
time.sleep(10) # <------------------------------------------------
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
time.sleep(10) # <------------------------------------------------
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
location_once_scrolled_into_view
You can try scrolling the page to the table before trying to get its location and size.
table.location_once_scrolled_into_view
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
## remove
# fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
# driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
table.location_once_scrolled_into_view # <-----------------------
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
use not headless mode
You should know that the location and size of an element in a headless browser may differ from that of a non-headless browser.
chrome_options = Options()
## remove
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
table_str = table.get_attribute("innerHTML")
similarity_tables = similarity(my_table_words,table_str)
if(similarity_tables>90):
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
Do your best.
If you've used all the ways, but they don't work out, try to adjust them while extracting the size yourself.
Hope this helps.