2

I'm trying to get the adidas shoe link from a search page, can't figure it out what I'm doing wrong.

I tried tags = soup.find("section", {"class": "productList"}).findAll("a") Doesnt work :(

I also tried to print all href and the desired link is not in there :(

So I'm expecting to print this :

https://www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138


from bs4 import BeautifulSoup
import requests

url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags into a list.
tags = soup.find("section", {"class": "productList"}).findAll("a")

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

Here's the html code for that link

<section class="productList"> <article class="productListing"> <a class="product" href="//www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138" title="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue" onmousedown="return nxt_repo.product_x('38698770','1');"> <span class="sale">SALE</span> <span class="image"> <img src="//www.tennisexpress.com/prodimages/78091-DEFAULT-m.jpg" alt="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue"> </span> <span class="brand"> Adidas </span> <span class="name"> Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue </span> <span class="pricing"> <strong class="listPrice">$140.00</strong> <strong class="percentOff">0% OFF</strong> <strong class="salePrice">$139.95</strong> </span> <br> </a> </article> </section>

3 Answers 3

2

By inspecting Network tab in Chrome DevTools you can notice that the products you search are fetched after making a request to https://tennisexpress-com.ecomm-nav.com/search.js. You can see example response here. As you can see, it's a mess, so I wouldn't follow this approach.

In your code, you couldn't see the products because the request is made by JavaScript (running in your browser) after the initial page load. Neither standalone urllib nor requests can render that content. However you can do that with Requests-HTML that has JavaScript support (it uses Chromium behind the scenes).

Code:

from itertools import chain
from requests_html import HTMLSession

session = HTMLSession()
url = 'https://www.tennisexpress.com/search.cfm?searchKeyword=adidas+boost'
r = session.get(url)
r.html.render()

links = list(chain(*[prod.absolute_links for prod in r.html.find('.product')]))

I used chain to join all the sets with absolute links together and I created a list out of it.

>>> links
['https://www.tennisexpress.com/adidas-mens-barricade-2018-boost-tennis-shoes-black-and-night-metallic-62110',
 'https://www.tennisexpress.com/adidas-mens-barricade-2018-boost-tennis-shoes-white-and-matte-silver-62109',
 ...
 'https://www.tennisexpress.com/adidas-mens-supernova-glide-7-running-shoes-black-and-white-41636',
 'https://www.tennisexpress.com/adidas-womens-adizero-boston-6-running-shoes-solar-yellow-and-midnight-gray-45268']

Don't forget to install Requests-HTML with pip install requests-html.

Sign up to request clarification or add additional context in comments.

8 Comments

your answer was really clearing a lot of things ! I tried to run your code an got an error : ModuleNotFoundError: No module named 'requests_html
Sorry, forgot to mention you need to install this package. I updated my answer.
hmmm....having issues to install pip install requests-html on mac siera 10.12.6 / using python 3.6 .... saying Failed building wheel for websockets
I guess you install packages globally and there are some conflicts. You can take a look at pipenv to manage packages independently in each of your projects.
probably you forgot about print()
|
1
soup = BeautifulSoup(data, "html.parser")    
markup = soup.find_all("section", class_=["productList"])
markupContent = markup.get_text()

So your code goes like

import urllib
from bs4 import BeautifulSoup
import requests

url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"

r = urllib.urlopen(url).read()
soup = BeautifulSoup(r, "html.parser")
productMarkup = soup.find_all("section", class_=["productList"])
product = productMarkup.get_text()

17 Comments

can I have the whole code ? Or at least guide me please where to feed it in my code above ? Thank you !
Have you tested this?
@PaulaThomas I did. No luck. Maybe I'm not feeding it corectly, that's why I asked for a full code, or a guide where to place it inside my code. Thank you Paula
docs.python.org/3/library/urllib.html Also, check the tag identifier as I mentioned.
@AnotherUser31 I couldn't get it working either. I suspect the page uses AJAX or similar.
|
0

Right here's the solution:

import requests
import bs4.BeautifulSoup as bs
url="https://www.tennisexpress.com/mens-adidas-tennis-shoes"
req = requests.get(url)
soup = bs(req.text,'lxml') # lxml because page is more xml than html
arts = soup.findAll("a",class_="product")

and that gives you a list of links to all the adidas tennis shoes! I'm sure you can manage from there.

3 Comments

so you're saying I can't get the shoe from that search page ? because for me would be easier to find the products that I only need, instead of going and trying to look for the one I need. I'm sure we can figure out something. I'll be back in 30 minutes, maybe we can chat. No idea how to start a chat room though. Please help me with this.
I can't get it to give me a list of shoes :( confused
Just do links = [art['href'] for art in arts] to get a "list of shoes", but still that solution doesn't answer your question and does a completely different thing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.