How to Scrape JSON Data Using Scrapy

Question

I'm using scrapy and I'm trying to test my selector using scrapy shell but nothing is working. I'm trying to scrape the JSON data on this website.

https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true

I've tried to scrape the data using the selector

   response.css("body > pre::text").extract()

However, this doesn't seem to be working. Not sure what's wrong...

Ideally, I just want to get all the "Name: XXX" elements from the JSON data. So If you know how to select those specifically, that would be very helpful as well!

Currently my code looks like this

    # -*- coding: utf-8 -*-
    import scrapy # needed to scrape
    import sys    # need to import xlrd
    sys.path.extend("/Users/YoungFreeesh/anaconda3/lib/python3.6/site- 
    packages/") # needed to import xlrd
    import xlrd   # used to easily import xlsx file 

    class AmazonbotSpider(scrapy.Spider):
        name = 'ArchiveSpider'

        allowed_domains = ['web.archive.org']
        start_urls =['https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true']

        def parse(self, response):
            print(response.body)

Re: "this doesn't seem to be working" — not sure anyone is a mind reader here. I could be wrong though... — l'L'l
– l'L'l, Commented Jun 11, 2018 at 20:16
I checked the networks log and it loads the json file from this url web.archive.org/web/20180604230058if_/https://api.simon.com/… .. Difference between both urls is 'if_'. See if this pattern matches with other links you have. You can use this hack to get your data. — sP_
– sP_, Commented Jun 11, 2018 at 20:19

nosklo · Accepted Answer · 2018-06-11 20:18:42Z

1

Since the content is inside an iframe, it is a separate page, you have to navigate to the iframe first. Like a link, something like that:

urls = response.css('iframe::attr(src)').extract()
for url in urls :
    yield scrapy.Request(url...., target=parse_iframe)

then define a new parse_iframe method where you parse the iframes response.

answered Jun 11, 2018 at 20:18

nosklo

224k58 gold badges300 silver badges299 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Debbie Over a year ago

Here is a similar question: stackoverflow.com/questions/52779161/… Could you please answer?

Collectives™ on Stack Overflow

How to Scrape JSON Data Using Scrapy

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related