Extracting Table data using Selenium and Python into pandas dataframe

Question

so I have done data extract from a table using library BeautifulSoup with code below:

        if soup.find("table", {"class":"a-keyvalue prodDetTable"}) is not None:
        table = parse_table(soup.find("table", {"class":"a-keyvalue prodDetTable"}))
        df = pd.DataFrame(table)

So this worked, I get the table nad parse it out into dataframe, however i am trying to do something similar on different website using selenium and here is my code so far:

driver = webdriver.Chrome()
i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))
driver.get(base_url)
table = driver.find_element_by_xpath("//*[@id='collapseSpecs']/div/div/div[1]/table/tbody")

So I am getting to the table and I tried to use getAttribute(innerHTML) and some other getAttribute elements but I am unable to get the table as is into pandas. Any suggestions on how to handle that with selenium?

Here is how html looks:

KunduK · Accepted Answer · 2019-10-11 19:51:23Z

7

Use pandas to fetch the tables. Try following code.

import pandas as pd
import time
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))
driver.get(base_url)
time.sleep(3)
html=driver.page_source
soup=BeautifulSoup(html,'html.parser')
div=soup.select_one("div#collapseSpecs")
table=pd.read_html(str(div))
print(table[0])
print(table[1])

Output:

                                      0                     1
0                     Battery Amp Hours                   1.3
1                     Tool Power Output               189 UWO
2                  Side Handle Included                    No
3             Number of Clutch Settings                    15
4                             Case Type                  Soft
5                           Series Name                   NaN
6                    Tool Weight (lbs.)                   2.2
7                  Tool Length (Inches)                   7.5
8                   Tool Width (Inches)                   2.0
9                  Tool Height (Inches)                  7.75
10  Forward and Reverse Switch Included                   Yes
11                            Sub-Brand                   NaN
12                         Battery Type  Lithium ion (Li-ion)
13                      Battery Voltage           12-volt max
14                     Charger Included                   Yes
15                       Variable Speed                   Yes
                                   0               1
0                 Maximum Chuck Size          3/8-in
1       Number of Batteries Included               2
2                   Battery Warranty  3-year limited
3                Maximum Speed (RPM)          1500.0
4            Bluetooth Compatibility              No
5              Charge Time (Minutes)              40
6                  App Compatibility              No
7                     Works with iOS              No
8                          Brushless              No
9   CA Residents: Prop 65 Warning(s)             Yes
10                     Tool Warranty  3-year limited
11                            UNSPSC        27112700
12                Works with Android              No
13                  Battery Included             Yes
14                       Right Angle              No
15               Wi-Fi Compatibility              No

If you want single dataframe try this.

import pandas as pd
import time
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))
driver.get(base_url)
time.sleep(3)
html=driver.page_source
soup=BeautifulSoup(html,'html.parser')
div=soup.select_one("div#collapseSpecs")
table=pd.read_html(str(div))
frames = [table[0], table[1]]
result=pd.concat(frames,ignore_index=True)
print(result)

Selenium options with pandas Dataframe.

import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
spec_name=[]
spec_item=[]
driver = webdriver.Chrome()
i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))
driver.get(base_url)
tables=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.XPATH,"//div[@id='collapseSpecs']//table")))
for table in tables:
    for row in table.find_elements_by_xpath(".//tr"):
        spec_name.append(row.find_element_by_xpath('./th').get_attribute('textContent'))
        spec_item.append(row.find_element_by_xpath('./td/span').get_attribute('textContent'))

df = pd.DataFrame({"Spec_Name":spec_name,"Spec_Title":spec_item})

print(df)

edited Oct 11, 2019 at 19:51

answered Oct 11, 2019 at 19:29

KunduK

33.4k5 gold badges19 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Slavisha84 Over a year ago

KunduK - Thank you so much, I figured I can use that, I was just wondering if there is anything that selenium has I can use with .getAttribute().

KunduK Over a year ago

@Slavisha84 : so you are after selenium solution as well?

Slavisha84 Over a year ago

yeah i was trying to see if i can write it as simple as posible without involving beautiful soup but if the Selenium doesnt have something simple than i will just use beautiful soup. I saw few people did this with counting the numbers of rows and columns then creating the table with selenium and then reading it 1st row, first column but that gets too complex and confusing.

KunduK Over a year ago

@Slavisha84 : Updated selenium option as well.

Slavisha84 Over a year ago

this works perfectly on this item. But when i switch to different model number for i = ["DCL510"] I am getting error: NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"./th"} I compared the html for both of the search items and they look identical to me. Why would i get this error message for "DCL510" ?

|

dhruv sharma · Accepted Answer · 2023-04-22 13:56:37Z

1

Assuming this snippet is correct and we have gotten the table elements extracted here:

driver = webdriver.Chrome()
i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))
driver.get(base_url)
table = driver.find_element_by_xpath("//*[@id='collapseSpecs']/div/div/div[1]/table/tbody")

The trick here is to use pandas read html util in the following manner so that the parsing works without causing errors:

dfs = pd.read_html(table.get_attribute('outerHTML'))

this will get you the parsed set of required dataframes from the selenium table element. Hopefully trying this works for you just the way it worked for me. Thanks!

answered Apr 22, 2023 at 13:56

dhruv sharma

715 bronze badges

Comments

Mohnish · Accepted Answer · 2021-04-01 09:43:33Z

0

You will need to install lxml for this to work using:

pip install lxml

Code:

import pandas as pd

i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))

df = pd.read_html(base_url)

print(df)

edited Apr 1, 2021 at 9:43

Mohnish

1,0061 gold badge12 silver badges20 bronze badges

answered Mar 31, 2021 at 2:47

Oakzeh

198 bronze badges

Collectives™ on Stack Overflow

Extracting Table data using Selenium and Python into pandas dataframe

3 Answers 3

7 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related