1

I am using this code to call the NY Times API and get the json data about the articles on a chosen search query.

import urllib
import re
import json

htmltext = urllib.urlopen('******http://call******')

data = json.load(htmltext)

print data

It prints out a result structured like this:

{u'status': u'OK', u'response': {u'docs': [{u'type_of_material': u'Article', u'blog': [], u'news_desk': None, u'lead_paragraph': u'Hon. PRESTON KING, one of the ablest, most upright, and most influential members of the Democratic party of this State, thus expression his opinion in regard to political prospects in a letter to a friend: OGDENSBURG, Saturday, Sept. 16, 1854.', u'headline': {u'main': u'POLITICAL.; New-York Politics--Letter from Preston King.'}, u'abstract': u'Letter to Jerry Rescue Celebration', u'print_page': u'8', u'word_count': 1526, u'_id': u'4fbfd3e945c1498b0d00ddca', u'snippet': u'Hon. PRESTON KING, one of the ablest, most upright, and most influential members of the Democratic party of this State, thus expression his opinion in regard to political prospects in a letter to a friend: OGDENSBURG, Saturday, Sept. 16, 1854.', u'source': u'The New York Times', u'web_url': u'http://query.nytimes.com/gst/abstract.html?res=950CE6DE1238EE3BBC4B53DFB667838F649FDE', u'multimedia': [], u'subsection_name': None, u'keywords': [{u'name': u'persons', u'value': u'KING, PRESTON'}, {u'name': u'persons', u'value': u'SUMNER CHARLES'}, {u'name': u'persons', u'value': u'BEECHER, HENRY WARD'}], u'byline': None, u'document_type': u'article', u'pub_date': u'1854-10-03T00:03:58Z', u'section_name': None}, {u'type_of_material': u'Article', u'blog': [], u'news_desk': None, u'lead_paragraph': u'MISSISSIPPI LAWS IN WANT OF REMODELING. GOVERNOR McWILLIE, of Mississippl, has summened an extra session of the State Legislature, to assemble on the first Monday in November next, In this State, as in others which have adopted the system of biennial sessions of their Legislatures. the plan has not been found to be the best for the interests of the people.', u'headline': {u'main': u'Article 1 -- No Title', u'kicker': u'1'}, u'abstract': None, u'print_page': u'3', u'word_count': 334, u'_id': u'4fbfe29945c1498b0d04bed8', u'snippet': u'MISSISSIPPI LAWS IN WANT OF REMODELING. GOVERNOR McWILLIE, of Mississippl, has summened an extra session of the State Legislature, to assemble on the first Monday in November next, In this State, as in others which have adopted the system of biennial...', u'source': u'The New York Times', u'web_url': u'http://query.nytimes.com/gst/abstract.html?res=9F06E7D61331EE34BC4952DFBE668383649FDE', u'multimedia': [], u'subsection_name': None, u'keywords': [], u'byline': None, u'document_type': u'article', u'pub_date': u'1858-08-11T00:03:58Z', u'section_name': None}, ... u'meta': {u'hits': 150, u'offset': 0, u'time': 38}}, u'copyright': u'Copyright (c) 2013 The New York Times Company.  All Rights Reserved.'}

For the purposes of example, I pasted here data for just 2 articles (it actually gives you data for 10 articles).

Now I want to parse that data and extract all the 'web_url' attributes. How can I do that?

I tried that code:

import urllib
import re
import json

htmltext = urllib.urlopen('******http://call******')

data = json.load(htmltext)

print data['web_url']

But it gives me this error:

Traceback (most recent call last):
    File "json_trying.py", line 10, in <module>
      print data["web_url"]
KeyError: 'web_url'

3 Answers 3

3

Take time to see the structure of the response.

{
    u'status': u'OK',
    u'response': {
        u'docs': [
            {
                ...
                u'web_url': u'http://query.nytimes.com/...',
                ...
            }
            {
                ...
            }
        ]
    }
}

for doc in data['response']['docs']:
    print doc['web_url']
Sign up to request clarification or add additional context in comments.

2 Comments

Yeah I'm saving it in Sublime 2 and its in one line and then I have huge problems understanding the structure of data. Your solution works perfectly!
@loop_digga, Instead of simply print, using pprint.pprint, you can get more readable output.
1

use the http://jsonviewer.stack.hu/ . paste the JSON response to see the structure first.

Then you might have to use the for loop something like:

for eachobj in jsonresponse: parse the remaining items.

Comments

0

I think the web_url key is a child of the response key you should be able to access it as data["response"]["web_url"] .

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.