0

I am very new to Python, JavaScript, and Web-Scraping. I am trying to write code that writes all of the data in tables like this into a csv file. The webpage is "https://www.mcmaster.com/cam-lock-fittings/material~aluminum/"

enter image description here

I started by trying to find the data in the html but then realized that the website uses JavaScript. I then tried using selenium but I cannot find anywhere in the JavaScript code that has the actual data that is displayed in these tables. I wrote this code to see if I could find the display data anywhere but I was unable to find it.

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='C:/Users/Brian Knoll/Desktop/chromedriver.exe', options=options)

driver.get(url)
html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()

I'm sure there's an obvious answer that is just going over my head. Any help would be greatly appreciated! Thank you!

1 Answer 1

3

I guess you need to wait till the table your looking for is loaded.
To do so, add the following line to wait for 10 seconds before start scraping the data

fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

Here is the full code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

url = 'https://www.mcmaster.com/cam-lock-fittings/material~aluminum/'


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), options=options)

driver.get(url)
fullLoad = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'ItmTblCntnr')]")))

html = driver.execute_script("return document.documentElement.outerHTML")
driver.close()

filename = "McMaster Text.txt"
fo = open(filename, "w")
fo.write(html)
fo.close()
Sign up to request clarification or add additional context in comments.

11 Comments

Thank you for your response, Rola. I got the following error: NameError: name 'By' is not defined
Put from selenium.webdriver.common.by import By at the top of your script. Will give you access to By.ID.
Thanks for your answer, AaronS. It ran that time but threw a timeout exception
@bknoll16 that's because the element doesn't exists, sorry i didn't notice the id is auto generated. i have modified my code, please try again now
@Rola thank you for your persistence with this. I am still getting a timeout exception with the updated code. Is everything working on your end?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.