3

So I'm parsing a really big log file with some embedded json.

So I'll see lines like this

foo="{my_object:foo, bar:baz}" a=b c=d

The problem is that since the internal json can have spaces, but outside of the JSON, spaces act as tuple delimiters (except where they have unquoted strings . Huzzah for whatever idiot thought that was a good idea), I'm not sure how to figure out where the end of the JSON string is without reimplementing large portions of a json parser.

Is there a json parser for Python where I can give it '{"my_object":"foo", "bar":"baz"} asdfasdf', and it can return ({'my_object' : 'foo', 'bar':'baz'}, 'asdfasdf') or am I going to have to reimplement the json parser by hand?

2
  • Would the end of the JSON string happen to be }" ? Commented Jul 25, 2013 at 2:00
  • Does your example accurately represent what lines really look like, quote-style and all? Because if so, it's not actually valid json and you'll have a hard time using any standard json libraries parsing it no matter what you do. Commented Jul 25, 2013 at 3:06

3 Answers 3

1

Found a really cool answer. Use json.JSONDecoder's scan_once function

In [30]: import json

In [31]: d = json.JSONDecoder()

In [32]: my_string = 'key="{"foo":"bar"}"more_gibberish'

In [33]: d.scan_once(my_string, 5)
Out[33]: ({u'foo': u'bar'}, 18)

In [37]: my_string[18:]
Out[37]: '"more_gibberish'

Just be careful

In [38]: d.scan_once(my_string, 6)
Out[38]: (u'foo', 11)
Sign up to request clarification or add additional context in comments.

Comments

0

Match everything around it.

>>> re.search('^foo="(.*)" a=.+ c=.+$', 'foo="{my_object:foo, bar:baz}" a=b c=d').group(1)
'{my_object:foo, bar:baz}'

1 Comment

Good idea except that we don't know what the following keys are or the key that leads into the json string is.
0

Use shlex and json.

Something like:

import shlex
import json

def decode_line(line):
    decoded = {}
    fields = shlex.split(line)
    for f in fields:
        k, v = f.split('=', 1)
        if k == "foo":
            v = json.loads(v)
        decoded[k] = v
    return decoded

This does assume that the JSON inside the quotes is quoted properly.

Here's a short example program that uses the above:

import pipes

testdict = {"hello": "world", "foo": "bar"}
line = 'foo=' + pipes.quote(json.dumps(testdict)) + ' a=b c=d'
print line
print decode_line(line)

With output:

foo='{"foo": "bar", "hello": "world"}' a=b c=d
{'a': 'b', 'c': 'd', 'foo': {u'foo': u'bar', u'hello': u'world'}}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.