0

I am trying to find the location of tables in a webpage where I do not have the ID/XPATH/CLASSNAME of the table. I am using similarity between the table I want and the tables present in the webpage. I am getting incorrect location and size of table when I use element.size / element.location. Any solution or anything am I doing wrong in the following:

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
   table_str = table.get_attribute("innerHTML")
   similarity_tables = similarity(my_table_words,table_str)
   if(similarity_tables>90):
       th = table.size['height']
       tw = table.size['width']
       tx = table.location['x']
       ty = table.location['y']

Using this code I am able to locate the correct/desired table but the location and size of the element returned is incorrect.

1 Answer 1

1

I think it took a long time to load the table.
Because Selenium is a dynamic web page automation framework, it can address this problem.
I'll tell you my know-how.

time.sleep()

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
driver.set_window_size(1024, fn('Height'))
time.sleep(10) # <------------------------------------------------
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
   table_str = table.get_attribute("innerHTML")
   similarity_tables = similarity(my_table_words,table_str)
   if(similarity_tables>90):
       time.sleep(10) # <------------------------------------------------
       th = table.size['height']
       tw = table.size['width']
       tx = table.location['x']
       ty = table.location['y']

location_once_scrolled_into_view

You can try scrolling the page to the table before trying to get its location and size.

table.location_once_scrolled_into_view
th = table.size['height']
tw = table.size['width']
tx = table.location['x']
ty = table.location['y']
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
## remove
# fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
# driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
   table_str = table.get_attribute("innerHTML")
   similarity_tables = similarity(my_table_words,table_str)
   if(similarity_tables>90):
       table.location_once_scrolled_into_view # <-----------------------
       th = table.size['height']
       tw = table.size['width']
       tx = table.location['x']
       ty = table.location['y']

use not headless mode

You should know that the location and size of an element in a headless browser may differ from that of a non-headless browser.

chrome_options = Options()
## remove
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(URL)
fn = lambda X: driver.execute_script('return document.body.parentNode.scroll' + X)
driver.set_window_size(1024, fn('Height'))
driver.save_screenshot("sample.png")
tables = driver.find_elements(By.TAG_NAME,"table")
for table in tables:
   table_str = table.get_attribute("innerHTML")
   similarity_tables = similarity(my_table_words,table_str)
   if(similarity_tables>90):
       th = table.size['height']
       tw = table.size['width']
       tx = table.location['x']
       ty = table.location['y']

Do your best.

If you've used all the ways, but they don't work out, try to adjust them while extracting the size yourself.

Hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, adding the time.sleep helped resolve the problem!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.