Python Selenium Webdriver Table Values to Excel

Question

I am working on a project scraping a table off a web site. I will not be able to give full code as this is a company specific site with a login, hence my choice of Selenium. I have located the table in the HTML code:

class Table:
    def __init__(self, driver):
        self.driver = driver
    def get_row_info(self):
        table_id = self.driver.find_element(By.ID, 'dgTickets')
        rows = table_id.find_elements(By.TAG_NAME, "tr")
        col = []
        i = 0
        for i in rows[0]:
            i+=1
            name = i.text()
            col.append((name, []))
        for j in range(1,len(rows)):
            T = rows[j]
            i = 0
            for t in T.iterchildren():
                data = t.text_content()
                if i>0:
                    try:
                        data = int(data)
                    except:
                        pass
                col[i][1].append(data)
                i+=1
        Dict = {title:column for (title, column) in col}

This returns me an error that it is not an iterable value.

I think what I am trying to do here is relatively self explanatory. Primarily, I am trying to return the web table and eventually get it into a pandas dataframe for parsing. Using various methods, I can get the columns to print out their texts, but there seems to be a problem with passing that to a specified column in a table. Here is one way that I have found to return the column:

        for row in rows:
            col0 = row.find_elements(By.TAG_NAME, "td")[0]

I'm honestly a little lost at this point. Any suggestions for me?

always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot). There are other useful information. — furas
– furas, Commented Feb 20, 2020 at 3:10
first suggestion: always show full error message - it shows in which line is error. We can't run it so we can't see errors. — furas
– furas, Commented Feb 20, 2020 at 3:12
second suggestion: use print() and print(type(...)) to see what you have in variables. OR learn how to use debugger. — furas
– furas, Commented Feb 20, 2020 at 3:13
some code makes no sense - ie. for i in rows[0]: i+=1 - rows is a list but you get first element from list rows[0] and you try to use it as list for i in rows[0] and when you even get it as i then you treats it as number i += 1 - but later you treats it as object i.text(). If using i += 1 you try to get next element then it is wrong. OR maybe you have to use the same variable for two differene elemen - i = 0 and for i in rows[0] - but Python can't keep keep two different values in the same variable. — furas
– furas, Commented Feb 20, 2020 at 3:17
BTW: if page uses standard tags to create table then you can use pandas.read_html() to get all tables on page as list of DataFrames - — furas
– furas, Commented Feb 20, 2020 at 3:20

furas · Accepted Answer · 2020-02-20 04:01:03Z

You can use pandas.read_html() to get all tables on page as list of DataFrames. It works very fast.

import selenium.webdriver
import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

driver = selenium.webdriver.Firefox()
driver.get(url)

# --- get table ---

all_tables = pd.read_html(driver.page_source, attrs={'id': 'constituents'})
df = all_tables[0]

# --- show it ---

print(df)

If you want to do it manually - but for this example it takes much longer.

import selenium.webdriver
import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

driver = selenium.webdriver.Firefox()
driver.get(url)

# --- get table ---

headers = []
columns = dict()

table_id = driver.find_element_by_id('constituents')
all_rows = table_id.find_elements_by_tag_name("tr")

# --- headers ---

row = all_rows[0]
all_items = row.find_elements_by_tag_name("th")
for item in all_items:
    name = item.text
    columns[name] = []
    headers.append(name)

print(headers)

# --- data ---

for row in all_rows[1:]:
    all_items = row.find_elements_by_tag_name("td")
    for name, item in zip(headers, all_items):
        value = item.text
        columns[name].append(value)

df = pd.DataFrame(columns)

# --- show it ---

print(df)

Collectives™ on Stack Overflow

Python Selenium Webdriver Table Values to Excel

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related