1

Through reading, videos, SO and help from the community, I was able to scrape data from Tessco.com using Selenium and Python.

This website requires a UN and PW. I have included this in the code below, this is non-essential credentials, made specifically to ask questions.

My end goal is to cycle through an Excel list of part numbers, and search for a set of parameters including price. Prior to introducing a list to cycle through, I am looking to isolate the required information from what was scraped.

I am unsure how to filter this information.

Code is as follows:

    import time
    #Need Selenium for interacting with web elements
    from selenium import webdriver
    from selenium.webdriver.support import expected_conditions as EC
    #Need numpy/pandas to interact with large datasets
    import numpy as np
    import pandas as pd
    
    chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
    driver = webdriver.Chrome(chrome_path)
    driver.get("https://www.tessco.com/login")
    
    userName = "[email protected]"
    password = "PasswordForThis123"
    
    #Set a wait, for elements to load into the DOM
    wait10 = WebDriverWait(driver, 10)
    wait20 = WebDriverWait(driver, 20)
    wait30 = WebDriverWait(driver, 30)
    
    elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID"))) 
    elem.send_keys(userName)
    
    elem = wait10.until(EC.element_to_be_clickable((By.ID, "password"))) 
    elem.send_keys(password)
    
    #Press the login button
    driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
    
    #Expand the search bar
    searchIcon = wait10.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i"))) 
    searchIcon.click()
    
    searchBar = wait10.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input'))) 
    searchBar.click()
    
    #load in manufacture part number from a collection of components, via an Excel file
    
    #Enter information into the search bar
    searchBar.send_keys("HL4RPV-50" + '\n')
    
    # wait for the products information to be loaded
    products = wait30.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
    # create a dictionary to store product and price
    productInfo = {}
    # iterate through all products in the search result and add details to dictionary
    for product in products:
        # get product info such as OEM, Description and Part Number
        productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
        mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
        mfgName = product.find_element_by_tag_name("img").get_attribute("alt")
        
        # get price
        price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
    
        # add details to dictionary
        productInfo[mfgPart, mfgName, productDescr] = price
    
    # print products information   
    print(productInfo)

The output is

{('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89', ('MFG PART #: HL4RPV-50B', 'CommScope', '1/2" Plenum Air Cable, Blue'): '$1.89', ('MFG PART #: L4HM-D', 'CommScope', '4.3-10 Male for 1/2" AL4RPV-50,LDF4-50A,HL4RPV-50'): '$19.94', ('MFG PART #: L4HR-D', 'CommScope', '4.3-10M RA for 1/2" AL4RPV-50, LDF4-50A, HL4RPV-50'): '$39.26', ('MFG PART #: UPL-4MT-12', 'JMA Wireless', '4.3-10 Male Connector for 1/2” Plenum Cables'): '$32.99', ('MFG PART #: UPL-4F-12', 'JMA Wireless', '4.3-10 Female Connector for 1/2" Plenum'): '$33.33', ('MFG PART #: UPL-4RT-12', 'JMA Wireless', '4.3-10 R/A Male Connector for 1/2" Plenum'): '$42.82', ('MFG PART #: L4HF-D', 'CommScope', '4.3-10 Female for 1/2 in AL4RPV-50, LDF4-50A'): '$20.30'}

I would just want what was referenced in the automated search, so for this example I would be looking for

('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89'

Eventually, I plan on replacing the HL4RPV-50 tag with a list of items, but for now, I belive I should filter what is needed.

I doubt the logic is right, but I have tried to print the product info for any part that equals that search requirement, like below.

for item in mfgPart:
    if mfgPart == "HL4RPV-50":
        print(productInfo)

But the above code just printed all output as before.

I then tried to import itertools and run the following:

print(dict(itertools.islice(productInfo.items(), 1)))

Which actually returned the line item I wanted, but there is no guarantee the first returned item is what I am looking for. It would be best if I can filter out the exact search, based on a given part number.

Is there a way I can filter the results based on the input?

Any hints are greatly appreciated.

3 Answers 3

1

The other answers seem to check if the part number is in the mfg part string, but I saw that some items may contain the same part number, such as HL4RPV-50 and HL4RPV-50B. If you want to isolate the part number so that you can know exactly what part you are looking at, I would recommend iterating through the dictionary, and splitting the mfg part string at the colon to get the ID. You can also grab the other parts of the item to more cleanly print out the information, as shown in the example below.

for (mfg_part, comm_scope, name), price in productInfo.items():
    mfg_id = mfg_part.split(': ')[1]
    if mfg_id == 'HL4RPV-50':
        print('Part #:', mfg_id)
        print('Company:', comm_scope)
        print('Name:', name)
        print('Price:', price)
Sign up to request clarification or add additional context in comments.

7 Comments

I would have never guessed to do that. Thank you! It works flawlessly. I am now trying to understand your thought process and the above code. I will shoot back comments if I need clarification. Thank you kindly!
I have encountered an error: NoSuchElementException when I use a differnt part number. when mfg_id = HL4RPV-50 this returns the correct info. However, when I enter the part FSJ4-50B, I get a NoSuchElementException error. Any idea why?
At which line in the block of code above is an error thrown? Also, does the format of the items on the site ever change? This code assumes that every item has a mfg_part, comm_scope, name, and price. I think that NoSuchElementException is a Selenium error, so your code to actually scrape the elements most likely could not find everything. If you change your scraping code to catch this with a try except around each find element, that may help. Example for the first find:
It doesn't format well in the comments so I'm not going to post the whole block of code here, but inside the try: block have productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text and in the except NoSuchElementException: block have productDescr = None
I am giving this a try now but, to see the results but it loops like the XPATH was different for each position int the results list. Also, it is possible my [1] comment at the end of price, trying to isolate the second element was forcing price to always return the first price in the list of products generated. I ended up isolating the variable in question using a CSS selector as opposed to XPATH. I posted to this thread:stackoverflow.com/questions/56996738/…
|
1

Your original example was really close, we've just got to loop through and check each item, with the list that's in the key section of our dictionary. If you don't mind the nestedness, this will do the trick:) You'll just need to adjust the keyword appropriately.

Note:

You might have to use productinfo.iteritems() if using Python 2.X, assuming 3.X in this case.

Example:

def main():

""" Get our key from our dictionary """
for key in productinfo.items():

    """ Retrieve our list of strings """
    for stringList in key[0]:

        """ For every new line in our list of strings """
        for newline in stringList.splitlines():

            """ Lets split by each word in our line """
            for string in newline.split(' '):

                """ Check each string against our keyword """
                if string == "HL4RPV-50B":
                    print(key)

if __name__ == '__main__':
    main()

3 Comments

Yes, I am using Python3. Sorry for not clarifying! I see what you are doing there, and it looks neat; though the code above returns nothing, and I know the string HL4RP-50 exists. Also, there is a second list item named HL4RPV-50B, wouldn't this item with a "B" appended also return a value?
Thank you David! Calling out the index of zero fixes the output, though it does print out the other line items that contain that string. I am looking into @jkulskis solution. It works very well, I am trying to understand it now.
@JamesHayek No problem
0

You can use this filter code for Python dictionary

 searchedProduct = dict(filter(lambda item: "HL4RPV-50" in item[0], productInfo.items()))
 print(searchedProduct)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.