0

I've looked over a few resources such as the following: Remove python dict item from nested json file but cannot seem to get my code to work. From what I understand of my JSON below (which is a variable placeholder for a WAY longer dump), it's a dict with a dict inside of it with a dict inside of that with....lists randomly inside of it. What I ultimately want to see is the following printout to my Terminal:

Message: [Client ID] 
Link: "http://linkgoeshere.com"

Here's what I have so far:

ThreeLine= {u'hits': {u'hits': [{u'_id': u'THIS IS THE FIRST ONE',
                  u'_index': u'foo',
                  u'_score': None,
                  u'_source': {u'@timestamp': u'2015-12-21T16:59:40.000-05:00',
                               u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}',
                               u'system': u'user-info'}},
                {u'_id': u'THIS IS THE SECOND ONE',
                  u'_index': u'two',
                  u'_score': None,
                  u'_source': {u'@timestamp': u'2015-12-12 T16:59:40.000-05:00',
                               u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":565656} {"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',
                               u'system': u'user-info'}},
]}}

unpacking= ThreeLine['hits']['hits'] #we only want to talk to the sort dictionary. 


for d in unpacking:
    newinfo= []
    narrow=[d["_source"] for d in unpacking if "_source" in d] 
    narrower=[d["message"] for d in narrow if "message" in d]
    newinfo.append(narrower)
print newinfo

Right now, with the code as it is, it'll print both entries, but it has a lot of random junk I don't care about, like all of the tags:

{"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',

So, how do I further strip out those entries so I just wind up with the two lines I ultimately want out of this insanely nested mess? If anyone has ideas for how I can clean up the current code, I'm all ears and ready to learn!

1 Answer 1

0

The 'tags' dictionary is not a dictionary. It is text embedded in the message string:

>>> ThreeLine['hits']['hits'][0]['_source']['message']
u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}'

You'll have to do some string parsing to remove that. You could use a regular expression:

import re
id_and_link = re.compile(r'(\[[^]]+\]) Information Link: (https?://[\w\d/.]+)')

messages = (entry['_source']['message'] for entry in ThreeLine['hits']['hits'] if '_source' in entry and 'message' in entry['_source'])
for message in messages:
    match = id_and_link.search(message)
    if not match:
        continue
    id_, link = match.groups()
    print 'Message:', id_
    print 'Link:', link
    print
Sign up to request clarification or add additional context in comments.

3 Comments

Hm, this doesn't give me anything. Any ideas why?
@SamW: probably because your real data is subtly different and thus the regular expression doesn't match. Give us some message values that are real, rather than made up for the question.
I got it figured out, Monday brain. :) Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.