I have been developing JavaScript for a decent time, but Python still feels a bit fresh to me. I'm trying to scrape the content from a simple webpage with Python (basically a product list with different sections). The content is dynamically generated so using the selenium module for this.
The content structure is like this with several product sections:
<div class="product-section">
<div class="section-title">
Product section name
</div>
<ul class="products">
<li class="product">
<div class="name">Wooden Table</div>
<div class="price">99 USD</div>
<div class="color">White</div>
</li>
</ul>
</div>
Python code for scraping the products:
driver = webdriver.Chrome()
driver.get("website.com")
names = driver.find_elements_by_css_selector('div.name')
prices = driver.find_elements_by_css_selector("div.price")
colors = driver.find_elements_by_css_selector('div.color')
allNames = [name.text for name in names]
allPrices = [price.text for price in prices]
allColors = [color.text for color in colors]
Right now I get the attributes of all products (see below) but I can't separate them from the different sections.
Current outcome
Wooden Table, 99 USD, White
Lawn Chair, 39 USD, Black
Tent - 4 Person, 299 USD, Camo
etc.
Desired outcome:
Outdoor Furniture
Wooden Table, 99 USD, White
Lawn Chair, 39 USD, Black
Camping Gear
Tent - 4 Person, 299 USD, Camo
Thermos, 19 USD, Metallic
The end goal is to output the contents into an excel product list, hence why I need to keep the sections separate (with their matching section title). Any idea how to keep them separate, even though they have the same class names?