How to get_attribute('innerHTML') from a list of URLs - Selenium?

Question

I am web scraping using Selenium in Python. And I'm using the xpath to extract part of the contents for the website.

I want to know how to use a loop extract a list of URLs and save them into a dictionary.

mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']

My coding below only works for 1 url...

driver = webdriver.Chrome(r'xxx\chromedriver.exe')
driver.get('https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0000104169')

driver.find_elements_by_xpath('/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td')[0].get_attribute('innerHTML')

Thank you for the help.

supputuri · Accepted Answer · 2019-07-17 14:02:21Z

2

You can use simple for each loop with WebDriverWait to make sure the table is loaded before getting the innerHTML.

Add below imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Script:

mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']
# open the browser
driver = webdriver.Chrome(r'xxx\chromedriver.exe')
# iterate through all the urls
for url in mylist_URLs:
    print(url)
    driver.get(url)
    # wait for the table to present
    element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))
    # now get the element innerHTML
    print(element.get_attribute('innerHTML')))

edited Jul 17, 2019 at 14:02

answered Jul 17, 2019 at 3:50

supputuri

14.2k2 gold badges26 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Arthur Morgan Over a year ago

I received the "SyntaxError: unexpected EOF while parsing #element.get_attribute('innerHTML')". I also found "SyntaxError: unexpected EOF while parsing element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))"

supputuri Over a year ago

updated the answer with the missing ) at the end of the line.

Arthur Morgan Over a year ago

Hmmm. Still received "SyntaxError: invalid syntax : print(element.get_attribute('innerHTML'))) "

Arthur Morgan Over a year ago

Solved! There should be a ")" after element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))

Collectives™ on Stack Overflow

How to get_attribute('innerHTML') from a list of URLs - Selenium?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related