1

I have been developing JavaScript for a decent time, but Python still feels a bit fresh to me. I'm trying to scrape the content from a simple webpage with Python (basically a product list with different sections). The content is dynamically generated so using the selenium module for this.

The content structure is like this with several product sections:

<div class="product-section">
    <div class="section-title">
        Product section name
    </div>
    <ul class="products">
        <li class="product">
            <div class="name">Wooden Table</div>
            <div class="price">99 USD</div>
            <div class="color">White</div>
        </li>
    </ul>
</div>

Python code for scraping the products:

driver = webdriver.Chrome()
driver.get("website.com")
names = driver.find_elements_by_css_selector('div.name')
prices = driver.find_elements_by_css_selector("div.price")
colors = driver.find_elements_by_css_selector('div.color')

allNames = [name.text for name in names]
allPrices = [price.text for price in prices]
allColors = [color.text for color in colors]

Right now I get the attributes of all products (see below) but I can't separate them from the different sections.

Current outcome
Wooden Table, 99 USD, White
Lawn Chair, 39 USD, Black
Tent - 4 Person, 299 USD, Camo
etc.

Desired outcome:
Outdoor Furniture
Wooden Table, 99 USD, White
Lawn Chair, 39 USD, Black

Camping Gear
Tent - 4 Person, 299 USD, Camo
Thermos, 19 USD, Metallic

The end goal is to output the contents into an excel product list, hence why I need to keep the sections separate (with their matching section title). Any idea how to keep them separate, even though they have the same class names?

3
  • Suggest you look at Beautiful Soup library at crummy.com/software/BeautifulSoup/bs4/doc Commented Apr 14, 2018 at 0:58
  • It seems like it has the functions I would need, thank you! Commented Apr 14, 2018 at 1:02
  • BeatifulSoup is very powerful library, but might be an overkill for simpler tasks - another api to learn. Vanilla selenium scraping is quite up for a task like this one. Commented Apr 14, 2018 at 10:00

1 Answer 1

1

You're almost there - to group the products by sections, then start off from a section and locate all elements within it. At least your sample html implies its structure allows it.

Based off your code, here's a solution with explanatory comments.

driver = webdriver.Chrome()
driver.get('website.com')

# a dict where the key will be the section name
products = {}

# find all top-level sections
sections = driver.find_elements_by_css_selector('div.product-section')

# iterate over each one
for section in sections:
    # find the products that are children of this section
    # note the find() is based of section, not driver
    names = section.find_elements_by_css_selector('div.name')
    prices = section.find_elements_by_css_selector('div.price')
    colors = section.find_elements_by_css_selector('div.color')

    allNames = [name.text for name in names]
    allPrices = [price.text for price in prices]
    allColors = [color.text for color in colors]

    section_name = section.find_element_by_css_selector('div.section-title').text

    # add the current scraped section to the products dict
    # I'm leaving it to you to match the name, price and color of each ;)

    products[section_name] = {'names': allNames,
                              'prices': allPrices,
                              'colors': allColors,}

# and here's how to access the result

# get the 1st name in a section:
print(products['Product section name']['names'][0])  # will output "Wooden Table"

# iterate over the sections and products:
for section in products:
    print('Section: {}'.format(section))
    print('All prices in the section:')
    for price in section['prices']:
       print(price)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! This is the exact structure I had in mind but did not know how to go about it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.