1

I am web scraping using Selenium in Python. And I'm using the xpath to extract part of the contents for the website.

I want to know how to use a loop extract a list of URLs and save them into a dictionary.

mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']

My coding below only works for 1 url...

driver = webdriver.Chrome(r'xxx\chromedriver.exe')
driver.get('https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0000104169')

driver.find_elements_by_xpath('/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td')[0].get_attribute('innerHTML')

Thank you for the help.

1 Answer 1

2

You can use simple for each loop with WebDriverWait to make sure the table is loaded before getting the innerHTML.

Add below imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Script:

mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']
# open the browser
driver = webdriver.Chrome(r'xxx\chromedriver.exe')
# iterate through all the urls
for url in mylist_URLs:
    print(url)
    driver.get(url)
    # wait for the table to present
    element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))
    # now get the element innerHTML
    print(element.get_attribute('innerHTML')))
Sign up to request clarification or add additional context in comments.

4 Comments

I received the "SyntaxError: unexpected EOF while parsing #element.get_attribute('innerHTML')". I also found "SyntaxError: unexpected EOF while parsing element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))"
updated the answer with the missing ) at the end of the line.
Hmmm. Still received "SyntaxError: invalid syntax : print(element.get_attribute('innerHTML'))) "
Solved! There should be a ")" after element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.