0

I have a file containing a large number of nested json objects. I pasted a snippet of it below. I am trying to use python to query all of the objects in the file to pull out those objects that have at least one custom feeds - url value that begins with "http://commshare" Some objects will not have any custom feeds, and the others will have one or more custom feed each of which might or might not begin with that string I am searching for. Any help would be appreciated! I am very new to Python.

Example JSON:

 [{
    "empid": "12345",
    "values": {
      "custom_feeds": {
        "custom_feeds": [
          {
            "name": "Bulletins",
            "url": "http://infoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
          }
        ]
      },
      "gadgetTitle": "InfoSec Updates",
      "newWindow": false,
      "article_limit_value": 10,
      "show_source": true
    }
  },
  {
    "empid": "23456",
    "values": {
      "custom_feeds": {
        "custom_feeds": [
          {
            "name": "1 News",
            "url": "http://blogs.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
          },
          {
            "name": "2 News",
            "url": "http://info.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
          },
          {
            "name": "3 News",
            "url": "http://blogs.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
          },
          {
            "name": "4 News",
            "url": "http://commshare.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
          }
        ]
      },
      "gadgetTitle": "Org News",
      "newWindow": false,
      "article_limit_value": 10,
      "show_source": true
    }
  },  {
    "empid": "34567",
    "values": {
      "custom_feeds": {
        "custom_feeds": []
      },
      "gadgetTitle": "Org News",
      "newWindow": false,
      "article_limit_value": 10,
      "show_source": true
    }
  }]
2
  • Load it into a python dict using json.load(open('path/to/input/file')). Iterate through each object (dictionary) and check len(obj['values']['custom_feeds']['custom_feeds']) Commented Jul 31, 2017 at 18:38
  • I don't believe that's valid JSON Commented Jul 31, 2017 at 18:48

1 Answer 1

1

Assuming your file is named input.json and you want the object for each feed, you could parse the JSON and create a new list where the feeds meet your criteria using list comprehension:

import json

with open('input.json') as input_file:
    items = json.loads(input_file.read())

feeds = [{'name': feed['name'], 'url': feed['url'], 'empid': item['empid']}
    for item in items
    for feed in item['values']['custom_feeds']['custom_feeds']
    if feed['url'].startswith('http://commshare')]

assert feeds == [{'name': '4 News', 'url': 'http://commshare.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'empid': '23456'}]
Sign up to request clarification or add additional context in comments.

2 Comments

This is great - thanks! How can I add the corresponding empid on so I know the empid attached to each url?
One way would be to construct a new object in the comprehension with an empid field (edited to show example).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.