1

Good Day, I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:

for url in links:
        driver.get(url)
        company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
        date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
        title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
        urlinf = driver.current_url #url info

        num_page_items = len(date)

        for i in range(num_page_items):
            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:

IndexError: list index out of range

what I have tried thus far:

1) created a try/except (doesn't work) 2) tried if/else (if variable is not "")

I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.

any assistance and guidance would be greatly appreciated.

EDIT 1:

I have tried the following:

for url in links:
        driver.get(url)
    try:
            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
            urlinf = driver.current_url #url info
        except:
        pass
        num_page_items = len(date)

        for i in range(num_page_items):
            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:
        driver.get(url)
    try:
            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
            urlinf = driver.current_url #url info
        except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
        pass

        num_page_items = len(date)

        for i in range(num_page_items):
            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

and:

for url in links:
        driver.get(url)
    try:
            company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
            date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
            title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
            urlinf = driver.current_url #url info
        except:
          i = 'Null'
          pass

        num_page_items = len(date)

        for i in range(num_page_items):
            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

I tried the same try/except at the point of appending to Pandas.

EDIT 2 the error I get:

IndexError: list index out of range

is attributed to the line:

df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)

4
  • Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed Commented Nov 22, 2018 at 6:57
  • I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried Commented Nov 22, 2018 at 7:36
  • I'll take a look... Commented Nov 22, 2018 at 10:10
  • I posted an answer let me know if you need any other assistance! Commented Nov 22, 2018 at 10:27

1 Answer 1

1

As your error shows you have an index error!

To overcome that you should add a try except within the area that raises this error.

Also, you are using the driver.current_url which returns the URL. But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...

In your case try this:

for url in links:
    driver.get(url)
    company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
    date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
    title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
    urlinf = driver.current_url #url info

    num_page_items = len(date)
    for i in range(num_page_items):
        try:
            df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
        except IndexError:
            df.append(None) # or df.append('Null')

Hope you find this helpfull!

Sign up to request clarification or add additional context in comments.

3 Comments

this solution works! thank you very much - I really appreciate it.
just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.