0

I am trying to scrap tshirt price from the following link : https://www.adidas.com/us/search?q=tshirt

from that link I look at the line where it says

<div class="gl-price-item gl-price-item--sale notranslate">$36</div>

This is what I did, and get

>>> fetch('https://www.adidas.com/us/search?q=tshirt')
2022-09-25 23:50:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adidas.com/us/search?q=tshirt> (referer: None)
>>> response.css('div.gl-price-item.gl-price-item--sale.notranslate')
[]

I'd expect to get at least 1 item returned from response.css('div.gl-price-item.gl-price-item--sale.notranslate') because gl-price-item.gl-price-item--sale.notranslate has an entry of $36, but I am getting a blank array. Why is this happening?

what am I doing wrong here?

2
  • Please read how to ask before asking additional questions, and edit this question to make it appropriate for Stackoverflow. Commented Sep 26, 2022 at 3:56
  • docs.scrapy.org/en/latest/topics/… Commented Sep 26, 2022 at 4:04

1 Answer 1

1

You are getting a blank array because data is loaded dynamicaly via API . So you can't grab dynamic content cause scrapy can't render JS. But you can pull all the required data from API with the help of scrapy.

Example:

import scrapy
class TestSpider(scrapy.Spider):
    name = 'test'
    def start_requests(self):
        headers= {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}

        api_url='https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt'
        
        yield scrapy.Request(
            url=api_url,
            headers=headers,
            callback= self.parse,
            method="GET")


    def parse(self, response):
        resp=response.json()
        
        for item in resp['raw']['itemList']['items']:
            yield {
                'price':item['price'],
                'salePrice':item['salePrice']
                }

Output:

{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 150, 'salePrice': 60}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 36}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 10}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 55, 'salePrice': 55}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 45, 'salePrice': 45}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 21}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 32, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 15}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 18}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 110, 'salePrice': 110}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 35}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 22, 'salePrice': 22}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 40, 'salePrice': 40}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 35, 'salePrice': 32}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 23}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 25, 'salePrice': 25}
2022-09-26 11:35:58 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>
{'price': 30, 'salePrice': 30}

... so on

Sign up to request clarification or add additional context in comments.

6 Comments

So the key point here is resp=response.json()?
and how did you get the link address of https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt
That's API url and how to find out API url? You can take help and find very effective discussions from here:stackoverflow.com/questions/1820927/…
Thank you for the answer! I have another question. I am not getting any output like yours, get the outputs of prices and saleprices. Instead, I am getting 2022-09-26 17:28:17 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.adidas.com/api/plp/content-engine/search?sitePath=us&query=tshirt>. I tried changing the user_agent, and still to no avail. Any... advice?
Go to settings.py file and change the robots.txt = False instead of True
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.