How do I extract the data from these JavaScript tables using Selenium and Python?

Question

I am very new to Python, JavaScript, and Web-Scraping. I am trying to write code that writes all of the data in tables like this into a csv file. The webpage is "https://www.mcmaster.com/cam-lock-fittings/material~aluminum/"

I started by trying to find the data in the html but then realized that the website uses JavaScript. I then tried using selenium but I cannot find anywhere in the JavaScript code that has the actual data that is displayed in these tables. I wrote this code to see if I could find the display data anywhere but I was unable to find it.

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='C:/Users/Brian Knoll/Desktop/chromedriver.exe', options=options)

driver.get(url)
html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

I'm sure there's an obvious answer that is just going over my head. Any help would be greatly appreciated! Thank you!

Rola · Accepted Answer · 2020-07-16 08:26:27Z

3

I guess you need to wait till the table your looking for is loaded.
To do so, add the following line to wait for 10 seconds before start scraping the data

fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

Here is the full code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), options=options)

driver.get(url)
fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

edited Jul 16, 2020 at 8:26

answered Jul 16, 2020 at 5:45

Rola

2,0241 gold badge18 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

bknoll16 Over a year ago

Thank you for your response, Rola. I got the following error: NameError: name 'By' is not defined

AaronS Over a year ago

Put from selenium.webdriver.common.by import By at the top of your script. Will give you access to By.ID.

bknoll16 Over a year ago

Thanks for your answer, AaronS. It ran that time but threw a timeout exception

Rola Over a year ago

@bknoll16 that's because the element doesn't exists, sorry i didn't notice the id is auto generated. i have modified my code, please try again now

bknoll16 Over a year ago

@Rola thank you for your persistence with this. I am still getting a timeout exception with the updated code. Is everything working on your end?

|

Collectives™ on Stack Overflow

How do I extract the data from these JavaScript tables using Selenium and Python?

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related