Good Day, I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:
for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:
IndexError: list index out of range
what I have tried thus far:
1) created a try/except (doesn't work) 2) tried if/else (if variable is not "")
I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.
any assistance and guidance would be greatly appreciated.
EDIT 1:
I have tried the following:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
and:
for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass
num_page_items = len(date)
for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)
I tried the same try/except at the point of appending to Pandas.
EDIT 2 the error I get:
IndexError: list index out of range
is attributed to the line:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)