Remove Python dict from JSON file response

Question

I've looked over a few resources such as the following: Remove python dict item from nested json file but cannot seem to get my code to work. From what I understand of my JSON below (which is a variable placeholder for a WAY longer dump), it's a dict with a dict inside of it with a dict inside of that with....lists randomly inside of it. What I ultimately want to see is the following printout to my Terminal:

Message: [Client ID] 
Link: "http://linkgoeshere.com"

Here's what I have so far:

ThreeLine= {u'hits': {u'hits': [{u'_id': u'THIS IS THE FIRST ONE',
                  u'_index': u'foo',
                  u'_score': None,
                  u'_source': {u'@timestamp': u'2015-12-21T16:59:40.000-05:00',
                               u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}',
                               u'system': u'user-info'}},
                {u'_id': u'THIS IS THE SECOND ONE',
                  u'_index': u'two',
                  u'_score': None,
                  u'_source': {u'@timestamp': u'2015-12-12 T16:59:40.000-05:00',
                               u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":565656} {"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',
                               u'system': u'user-info'}},
]}}

unpacking= ThreeLine['hits']['hits'] #we only want to talk to the sort dictionary. 


for d in unpacking:
    newinfo= []
    narrow=[d["_source"] for d in unpacking if "_source" in d] 
    narrower=[d["message"] for d in narrow if "message" in d]
    newinfo.append(narrower)
print newinfo

Right now, with the code as it is, it'll print both entries, but it has a lot of random junk I don't care about, like all of the tags:

{"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',

So, how do I further strip out those entries so I just wind up with the two lines I ultimately want out of this insanely nested mess? If anyone has ideas for how I can clean up the current code, I'm all ears and ready to learn!

Martijn Pieters · Accepted Answer · 2015-12-28 17:59:43Z

0

The 'tags' dictionary is not a dictionary. It is text embedded in the message string:

>>> ThreeLine['hits']['hits'][0]['_source']['message']
u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}'

You'll have to do some string parsing to remove that. You could use a regular expression:

import re
id_and_link = re.compile(r'(\[[^]]+\]) Information Link: (https?://[\w\d/.]+)')

messages = (entry['_source']['message'] for entry in ThreeLine['hits']['hits'] if '_source' in entry and 'message' in entry['_source'])
for message in messages:
    match = id_and_link.search(message)
    if not match:
        continue
    id_, link = match.groups()
    print 'Message:', id_
    print 'Link:', link
    print

answered Dec 28, 2015 at 17:59

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sam W Over a year ago

Hm, this doesn't give me anything. Any ideas why?

Martijn Pieters Over a year ago

@SamW: probably because your real data is subtly different and thus the regular expression doesn't match. Give us some message values that are real, rather than made up for the question.

Sam W Over a year ago

I got it figured out, Monday brain. :) Thank you so much!

Collectives™ on Stack Overflow

Remove Python dict from JSON file response

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related