Extracting attributes with Python out of json API response

Question

I am using this code to call the NY Times API and get the json data about the articles on a chosen search query.

import urllib
import re
import json

htmltext = urllib.urlopen('******http://call******')

data = json.load(htmltext)

print data

It prints out a result structured like this:

{u'status': u'OK', u'response': {u'docs': [{u'type_of_material': u'Article', u'blog': [], u'news_desk': None, u'lead_paragraph': u'Hon. PRESTON KING, one of the ablest, most upright, and most influential members of the Democratic party of this State, thus expression his opinion in regard to political prospects in a letter to a friend: OGDENSBURG, Saturday, Sept. 16, 1854.', u'headline': {u'main': u'POLITICAL.; New-York Politics--Letter from Preston King.'}, u'abstract': u'Letter to Jerry Rescue Celebration', u'print_page': u'8', u'word_count': 1526, u'_id': u'4fbfd3e945c1498b0d00ddca', u'snippet': u'Hon. PRESTON KING, one of the ablest, most upright, and most influential members of the Democratic party of this State, thus expression his opinion in regard to political prospects in a letter to a friend: OGDENSBURG, Saturday, Sept. 16, 1854.', u'source': u'The New York Times', u'web_url': u'http://query.nytimes.com/gst/abstract.html?res=950CE6DE1238EE3BBC4B53DFB667838F649FDE', u'multimedia': [], u'subsection_name': None, u'keywords': [{u'name': u'persons', u'value': u'KING, PRESTON'}, {u'name': u'persons', u'value': u'SUMNER CHARLES'}, {u'name': u'persons', u'value': u'BEECHER, HENRY WARD'}], u'byline': None, u'document_type': u'article', u'pub_date': u'1854-10-03T00:03:58Z', u'section_name': None}, {u'type_of_material': u'Article', u'blog': [], u'news_desk': None, u'lead_paragraph': u'MISSISSIPPI LAWS IN WANT OF REMODELING. GOVERNOR McWILLIE, of Mississippl, has summened an extra session of the State Legislature, to assemble on the first Monday in November next, In this State, as in others which have adopted the system of biennial sessions of their Legislatures. the plan has not been found to be the best for the interests of the people.', u'headline': {u'main': u'Article 1 -- No Title', u'kicker': u'1'}, u'abstract': None, u'print_page': u'3', u'word_count': 334, u'_id': u'4fbfe29945c1498b0d04bed8', u'snippet': u'MISSISSIPPI LAWS IN WANT OF REMODELING. GOVERNOR McWILLIE, of Mississippl, has summened an extra session of the State Legislature, to assemble on the first Monday in November next, In this State, as in others which have adopted the system of biennial...', u'source': u'The New York Times', u'web_url': u'http://query.nytimes.com/gst/abstract.html?res=9F06E7D61331EE34BC4952DFBE668383649FDE', u'multimedia': [], u'subsection_name': None, u'keywords': [], u'byline': None, u'document_type': u'article', u'pub_date': u'1858-08-11T00:03:58Z', u'section_name': None}, ... u'meta': {u'hits': 150, u'offset': 0, u'time': 38}}, u'copyright': u'Copyright (c) 2013 The New York Times Company.  All Rights Reserved.'}

For the purposes of example, I pasted here data for just 2 articles (it actually gives you data for 10 articles).

Now I want to parse that data and extract all the 'web_url' attributes. How can I do that?

I tried that code:

import urllib
import re
import json

htmltext = urllib.urlopen('******http://call******')

data = json.load(htmltext)

print data['web_url']

But it gives me this error:

Traceback (most recent call last):
    File "json_trying.py", line 10, in <module>
      print data["web_url"]
KeyError: 'web_url'

falsetru · Accepted Answer · 2014-07-26 11:13:45Z

3

Take time to see the structure of the response.

{
    u'status': u'OK',
    u'response': {
        u'docs': [
            {
                ...
                u'web_url': u'http://query.nytimes.com/...',
                ...
            }
            {
                ...
            }
        ]
    }
}

for doc in data['response']['docs']:
    print doc['web_url']

answered Jul 26, 2014 at 11:13

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

bcrvc Over a year ago

Yeah I'm saving it in Sublime 2 and its in one line and then I have huge problems understanding the structure of data. Your solution works perfectly!

falsetru Over a year ago

@loop_digga, Instead of simply print, using pprint.pprint, you can get more readable output.

netizen · Accepted Answer · 2014-07-26 12:49:23Z

1

use the http://jsonviewer.stack.hu/ . paste the JSON response to see the structure first.

Then you might have to use the for loop something like:

for eachobj in jsonresponse: parse the remaining items.

answered Jul 26, 2014 at 12:49

netizen

254 bronze badges

Comments

Tim McDonald · Accepted Answer · 2014-07-26 11:19:36Z

0

I think the web_url key is a child of the response key you should be able to access it as data["response"]["web_url"] .

answered Jul 26, 2014 at 11:19

Tim McDonald

1,2621 gold badge10 silver badges13 bronze badges

Collectives™ on Stack Overflow

Extracting attributes with Python out of json API response

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related