Using Scrapy to scrape nested JSON data?

Question

I am trying to write a Web app that crawls info from Sony's PlayStation store. I've found the JSON file that has the data I want, but I'm wondering how to use Scrapy to store only certain elements of the JSON file?

Here's part of the JSON data:

{
  "age_limit":0,
  "attributes":{
       "facets":{
          "platform":[
              {"name":"PS4™","count":96,"key":"ps4"},
              {"name":"PS3™","count":5,"key":"ps3"},
              {"name":"PS Vita","count":7,"key":"vita"},
          ]
       }
     }
    }

I only want the "count" value for the "name" PS4. How would I get this in Scrapy? Here is my Scrapy code thus far:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from crossbuy.items import PS4Vita


class PS4VitaSpider(BaseSpider):
    name = "ps4vita" # Name of the spider, to be used when crawling
    allowed_domains = ["store.playstation.com"] # Where the spider is allowed to     go
    start_url = "https://store.playstation.com/chihiro-api/viewfinder/US/en/999/STORE-MSF77008-9_PS4PSVCBBUNDLE?size=30&gkb=1&geoCountry=US"

    def parse(self, response):
        jsonresponse = json.loads(response)

        pass # To be changed later

Thanks!

Can't you just access the {"name": "PS4} in normal way? eg. [ p["count"] for p in jsonresponse["attributes"]["facets"]["platform"] if p["name"] == "PS4™" ]? — Anzel
– Anzel, Commented Apr 1, 2016 at 22:27

eLRuLL · Accepted Answer · 2016-04-01 22:25:05Z

2

...
def parse(self, response):
    jsonresponse = json.loads(response.body)
    my_count = None
    for platform in jsonresponse['attributes']['facets']['platform']:
        if 'PS4' in platform['name']:
            my_count = platform['count']

    yield dict(count=my_count)
...

answered Apr 1, 2016 at 22:25

eLRuLL

18.8k9 gold badges79 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Benjamin James · Accepted Answer · 2016-04-01 22:30:11Z

0

Simply access the json data as you would a python dictionary:

# To get a list of the counts:
counts = [x['count'] for x in jsonresponse['attributes']['facets']['platform']]

answered Apr 1, 2016 at 22:30

Benjamin James

9611 gold badge9 silver badges26 bronze badges

Collectives™ on Stack Overflow

Using Scrapy to scrape nested JSON data?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related