Through reading, videos, SO and help from the community, I was able to scrape data from Tessco.com using Selenium and Python.
This website requires a UN and PW. I have included this in the code below, this is non-essential credentials, made specifically to ask questions.
My end goal is to cycle through an Excel list of part numbers, and search for a set of parameters including price. Prior to introducing a list to cycle through, I am looking to isolate the required information from what was scraped.
I am unsure how to filter this information.
Code is as follows:
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.tessco.com/login")
userName = "[email protected]"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
#Expand the search bar
searchIcon = wait10.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i")))
searchIcon.click()
searchBar = wait10.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input')))
searchBar.click()
#load in manufacture part number from a collection of components, via an Excel file
#Enter information into the search bar
searchBar.send_keys("HL4RPV-50" + '\n')
# wait for the products information to be loaded
products = wait30.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
# create a dictionary to store product and price
productInfo = {}
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product info such as OEM, Description and Part Number
productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
mfgName = product.find_element_by_tag_name("img").get_attribute("alt")
# get price
price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
# add details to dictionary
productInfo[mfgPart, mfgName, productDescr] = price
# print products information
print(productInfo)
The output is
{('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89', ('MFG PART #: HL4RPV-50B', 'CommScope', '1/2" Plenum Air Cable, Blue'): '$1.89', ('MFG PART #: L4HM-D', 'CommScope', '4.3-10 Male for 1/2" AL4RPV-50,LDF4-50A,HL4RPV-50'): '$19.94', ('MFG PART #: L4HR-D', 'CommScope', '4.3-10M RA for 1/2" AL4RPV-50, LDF4-50A, HL4RPV-50'): '$39.26', ('MFG PART #: UPL-4MT-12', 'JMA Wireless', '4.3-10 Male Connector for 1/2” Plenum Cables'): '$32.99', ('MFG PART #: UPL-4F-12', 'JMA Wireless', '4.3-10 Female Connector for 1/2" Plenum'): '$33.33', ('MFG PART #: UPL-4RT-12', 'JMA Wireless', '4.3-10 R/A Male Connector for 1/2" Plenum'): '$42.82', ('MFG PART #: L4HF-D', 'CommScope', '4.3-10 Female for 1/2 in AL4RPV-50, LDF4-50A'): '$20.30'}
I would just want what was referenced in the automated search, so for this example I would be looking for
('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89'
Eventually, I plan on replacing the HL4RPV-50 tag with a list of items, but for now, I belive I should filter what is needed.
I doubt the logic is right, but I have tried to print the product info for any part that equals that search requirement, like below.
for item in mfgPart:
if mfgPart == "HL4RPV-50":
print(productInfo)
But the above code just printed all output as before.
I then tried to import itertools and run the following:
print(dict(itertools.islice(productInfo.items(), 1)))
Which actually returned the line item I wanted, but there is no guarantee the first returned item is what I am looking for. It would be best if I can filter out the exact search, based on a given part number.
Is there a way I can filter the results based on the input?
Any hints are greatly appreciated.