Parse valid JSON object or array from a string

Question

I have a string that can be one of two forms:

name multi word description {...}

or

name multi word description [...]

where {...} and [...] are any valid JSON. I am interested in parsing out just the JSON part of the string, but I'm not sure of the best way to do it (especially since I don't know which of the two forms the string will be). This is my current method:

import json

string = 'bob1: The ceo of the company {"salary": 100000}' 
o_ind = string.find('{')
a_ind = string.find('[')

if o_ind == -1 and a_ind == -1:
    print("Could not find JSON")
    exit(0)

index = min(o_ind, a_ind)
if index == -1:
    index = max(o_ind, a_ind)

json = json.loads(string[index:])
print(json)

It works, but I can't help but feel like it could be done better. I thought maybe regex, but I was having trouble with it matching sub objects and arrays and not the outermost json object or array. Any suggestions?

I think it is simple and readable, rather than using a complex RegEx. — thefourtheye
– thefourtheye, Commented Jan 23, 2016 at 5:29

Community · Accepted Answer · 2017-05-23 10:28:25Z

You can locate the start of the JSON by checking the presence of { or [ and then save everything to the end of the string into a capturing group:

>>> import re
>>> string1 = 'bob1: The ceo of the company {"salary": 100000}'
>>> string2 = 'bob1: The ceo of the company ["10001", "10002"]'
>>> 
>>> re.search(r"\s([{\[].*?[}\]])$", string1).group(1)
'{"salary": 100000}'
>>> re.search(r"\s([{\[].*?[}\]])$", string2).group(1)
'["10001", "10002"]'

Here the \s([{\[].*?[}\]])$ breaks down to:

\s - a single space character
parenthesis is a capturing group
[{\[] would match a single { or [ (the latter needs to be escaped with a backslash)
.*? is a non-greedy match for any characters any number of times
[}\]] would match a single } and ] (the latter needs to be escaped with a backslash)
$ means the end of the string

Or, you may use re.split() to split the string by a space followed by a { or [ (with a positive look ahead) and get the last item. It works for the sample input you've provided, but not sure if this is reliable in general:

>>> re.split(r"\s(?=[{\[])", string1)[-1]
'{"salary": 100000}'
>>> re.split(r"\s(?=[{\[])", string2)[-1]
'["10001", "10002"]'

midori · Accepted Answer · 2016-01-23 20:54:26Z

4

You would use simple | in regex matching both needed substrings:

import re
import json

def json_from_s(s):
    match = re.findall(r"{.+[:,].+}|\[.+[,:].+\]", s)
    return json.loads(match[0]) if match else None

And some tests:

print json_from_s('bob1: The ceo of the company {"salary": 100000}')
print json_from_s('bob1: The ceo of the company ["salary", 100000]')
print json_from_s('bob1')
print json_from_s('{1:}')
print json_from_s('[,1]')

Output:

{u'salary': 100000}
[u'salary', 100000]
None
None
None

edited Jan 23, 2016 at 20:54

answered Jan 23, 2016 at 7:10

midori

4,8375 gold badges37 silver badges62 bronze badges

8 Comments

Gillespie Over a year ago

Consider this case: 'bob1: The ceo of the company [{"salary": 100000}]'. The regex only matches the inner json object and not the outer json array

midori Over a year ago

I only follow the ops question and explanation

Gillespie Over a year ago

I am the OP, and the explanation I gave is that the string can be of the form name multi word description [...]. The case I gave you above follows that pattern, but the regex fails to capture it.

midori Over a year ago

It doesn't fail to catch [...] as you could see from the tests, the one you provided in the comment above won't be caught by the accepted answer either because you didn't specify in your question that json might be inside the list

midori Over a year ago

If you want just catch any json in the string but not list, remove the or part in the regex

|

Collectives™ on Stack Overflow

Parse valid JSON object or array from a string

2 Answers 2

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related