Python, Selenium: Isolate Item From Returned List

Question

Through reading, videos, SO and help from the community, I was able to scrape data from Tessco.com using Selenium and Python.

This website requires a UN and PW. I have included this in the code below, this is non-essential credentials, made specifically to ask questions.

My end goal is to cycle through an Excel list of part numbers, and search for a set of parameters including price. Prior to introducing a list to cycle through, I am looking to isolate the required information from what was scraped.

I am unsure how to filter this information.

Code is as follows:

    import time
    #Need Selenium for interacting with web elements
    from selenium import webdriver
    from selenium.webdriver.support import expected_conditions as EC
    #Need numpy/pandas to interact with large datasets
    import numpy as np
    import pandas as pd
    
    chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
    driver = webdriver.Chrome(chrome_path)
    driver.get("https://www.tessco.com/login")
    
    userName = "[email protected]"
    password = "PasswordForThis123"
    
    #Set a wait, for elements to load into the DOM
    wait10 = WebDriverWait(driver, 10)
    wait20 = WebDriverWait(driver, 20)
    wait30 = WebDriverWait(driver, 30)
    
    elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID"))) 
    elem.send_keys(userName)
    
    elem = wait10.until(EC.element_to_be_clickable((By.ID, "password"))) 
    elem.send_keys(password)
    
    #Press the login button
    driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
    
    #Expand the search bar
    searchIcon = wait10.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i"))) 
    searchIcon.click()
    
    searchBar = wait10.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input'))) 
    searchBar.click()
    
    #load in manufacture part number from a collection of components, via an Excel file
    
    #Enter information into the search bar
    searchBar.send_keys("HL4RPV-50" + '\n')
    
    # wait for the products information to be loaded
    products = wait30.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
    # create a dictionary to store product and price
    productInfo = {}
    # iterate through all products in the search result and add details to dictionary
    for product in products:
        # get product info such as OEM, Description and Part Number
        productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
        mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
        mfgName = product.find_element_by_tag_name("img").get_attribute("alt")
        
        # get price
        price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
    
        # add details to dictionary
        productInfo[mfgPart, mfgName, productDescr] = price
    
    # print products information   
    print(productInfo)

The output is

{('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89', ('MFG PART #: HL4RPV-50B', 'CommScope', '1/2" Plenum Air Cable, Blue'): '$1.89', ('MFG PART #: L4HM-D', 'CommScope', '4.3-10 Male for 1/2" AL4RPV-50,LDF4-50A,HL4RPV-50'): '$19.94', ('MFG PART #: L4HR-D', 'CommScope', '4.3-10M RA for 1/2" AL4RPV-50, LDF4-50A, HL4RPV-50'): '$39.26', ('MFG PART #: UPL-4MT-12', 'JMA Wireless', '4.3-10 Male Connector for 1/2” Plenum Cables'): '$32.99', ('MFG PART #: UPL-4F-12', 'JMA Wireless', '4.3-10 Female Connector for 1/2" Plenum'): '$33.33', ('MFG PART #: UPL-4RT-12', 'JMA Wireless', '4.3-10 R/A Male Connector for 1/2" Plenum'): '$42.82', ('MFG PART #: L4HF-D', 'CommScope', '4.3-10 Female for 1/2 in AL4RPV-50, LDF4-50A'): '$20.30'}

I would just want what was referenced in the automated search, so for this example I would be looking for

('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89'

Eventually, I plan on replacing the HL4RPV-50 tag with a list of items, but for now, I belive I should filter what is needed.

I doubt the logic is right, but I have tried to print the product info for any part that equals that search requirement, like below.

for item in mfgPart:
    if mfgPart == "HL4RPV-50":
        print(productInfo)

But the above code just printed all output as before.

I then tried to import itertools and run the following:

print(dict(itertools.islice(productInfo.items(), 1)))

Which actually returned the line item I wanted, but there is no guarantee the first returned item is what I am looking for. It would be best if I can filter out the exact search, based on a given part number.

Is there a way I can filter the results based on the input?

Any hints are greatly appreciated.

jkulskis · Accepted Answer · 2019-07-08 19:19:48Z

1

The other answers seem to check if the part number is in the mfg part string, but I saw that some items may contain the same part number, such as HL4RPV-50 and HL4RPV-50B. If you want to isolate the part number so that you can know exactly what part you are looking at, I would recommend iterating through the dictionary, and splitting the mfg part string at the colon to get the ID. You can also grab the other parts of the item to more cleanly print out the information, as shown in the example below.

for (mfg_part, comm_scope, name), price in productInfo.items():
    mfg_id = mfg_part.split(': ')[1]
    if mfg_id == 'HL4RPV-50':
        print('Part #:', mfg_id)
        print('Company:', comm_scope)
        print('Name:', name)
        print('Price:', price)

edited Jul 8, 2019 at 19:19

answered Jul 8, 2019 at 19:13

jkulskis

1241 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

James Hayek Over a year ago

I would have never guessed to do that. Thank you! It works flawlessly. I am now trying to understand your thought process and the above code. I will shoot back comments if I need clarification. Thank you kindly!

James Hayek Over a year ago

I have encountered an error: NoSuchElementException when I use a differnt part number. when mfg_id = HL4RPV-50 this returns the correct info. However, when I enter the part FSJ4-50B, I get a NoSuchElementException error. Any idea why?

jkulskis Over a year ago

At which line in the block of code above is an error thrown? Also, does the format of the items on the site ever change? This code assumes that every item has a mfg_part, comm_scope, name, and price. I think that NoSuchElementException is a Selenium error, so your code to actually scrape the elements most likely could not find everything. If you change your scraping code to catch this with a try except around each find element, that may help. Example for the first find:

jkulskis Over a year ago

It doesn't format well in the comments so I'm not going to post the whole block of code here, but inside the try: block have productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text and in the except NoSuchElementException: block have productDescr = None

James Hayek Over a year ago

I am giving this a try now but, to see the results but it loops like the XPATH was different for each position int the results list. Also, it is possible my [1] comment at the end of price, trying to isolate the second element was forcing price to always return the first price in the list of products generated. I ended up isolating the variable in question using a CSS selector as opposed to XPATH. I posted to this thread:stackoverflow.com/questions/56996738/…

|

David Silveiro · Accepted Answer · 2019-07-08 20:16:04Z

1

Your original example was really close, we've just got to loop through and check each item, with the list that's in the key section of our dictionary. If you don't mind the nestedness, this will do the trick:) You'll just need to adjust the keyword appropriately.

Note:

You might have to use productinfo.iteritems() if using Python 2.X, assuming 3.X in this case.

Example:

def main():

""" Get our key from our dictionary """
for key in productinfo.items():

    """ Retrieve our list of strings """
    for stringList in key[0]:

        """ For every new line in our list of strings """
        for newline in stringList.splitlines():

            """ Lets split by each word in our line """
            for string in newline.split(' '):

                """ Check each string against our keyword """
                if string == "HL4RPV-50B":
                    print(key)

if __name__ == '__main__':
    main()

edited Jul 8, 2019 at 20:16

answered Jul 8, 2019 at 19:07

David Silveiro

1,6572 gold badges17 silver badges29 bronze badges

3 Comments

James Hayek Over a year ago

Yes, I am using Python3. Sorry for not clarifying! I see what you are doing there, and it looks neat; though the code above returns nothing, and I know the string HL4RP-50 exists. Also, there is a second list item named HL4RPV-50B, wouldn't this item with a "B" appended also return a value?

James Hayek Over a year ago

Thank you David! Calling out the index of zero fixes the output, though it does print out the other line items that contain that string. I am looking into @jkulskis solution. It works very well, I am trying to understand it now.

David Silveiro Over a year ago

@JamesHayek No problem

Chandra Shekhar · Accepted Answer · 2019-07-08 19:06:59Z

0

You can use this filter code for Python dictionary

 searchedProduct = dict(filter(lambda item: "HL4RPV-50" in item[0], productInfo.items()))
 print(searchedProduct)

answered Jul 8, 2019 at 19:06

Chandra Shekhar

6642 gold badges10 silver badges26 bronze badges

Collectives™ on Stack Overflow

Python, Selenium: Isolate Item From Returned List

3 Answers 3

7 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related