0

I have the following JSON string :

{"FirstName":"John","LastName":"Smith"}

When I apply the following regex, it correctly returns the key-value pair groups:

(?<keyValuePair>(?<key>"\w+"):(?<value>".*?[^\\]"+?))+?

I get the matches:

1. "FirstName":"John"
    1.1 key:"FirstName"
    1.2 value:"John"
2. "LastName":"Smith"
    2.1 key:"LastName"
    2.2 value:"Smith"

Now, I want to have a group for object, i.e. find all objects.. On the same JSON string, I apply the following regex

(?<object>{(?<properties>.*?)})

I get the matches:

1. {"FirstName":"John","LastName":"Smith"}
    1.1 object : {"FirstName":"John","LastName":"Smith"}
    1.2 properties : "FirstName":"John","LastName":"Smith"

What I want is the get the goups of the first regex as sub-groups of properties in the second regex.

So the expected result should be:

1. {"FirstName":"John","LastName":"Smith"}
    1.1 object : {"FirstName":"John","LastName":"Smith"}
    1.2 properties : "FirstName":"John","LastName":"Smith"
        1.2.1 "FirstName":"John"
            1.2.1.1 key : "FirstName"
            1.2.1.2 value : "John"
        1.2.2 "LastName":"Smith"
            1.2.2.1 key : "LastName"
            1.2.2.2 value : "Smith"

Could someone help me to create a regex to get the result as above.

This would not count as a duplicate

I have so far tried many things since the past 3 hours and my mind is spinning.

5
  • 1
    Is there a reason you are using RegEx and not e.g. JSON.net? Commented Jul 25, 2014 at 6:26
  • 2
    Parsing JSON via Regex is not good idea I think. Why don't just use Newtonsoft JSON and its JObject? Commented Jul 25, 2014 at 6:26
  • I have an application that gets data from the database and process it in .Net. The query is provided by the user at runtime, so I don't know the schema. They created a custom parser that took around 4.5 secs to process that json. Newtonsoft took 6 secs. I want to bring down the time to as minimum as possible. I want to see if I can achieve it using Regex. Commented Jul 25, 2014 at 6:29
  • I assume you mean 6 seconds to parse a large chunk of JSON. The above sample is tiny. Commented Jul 25, 2014 at 6:50
  • the json is very large. the network service that returns the data has an object with 197 string properties for the query I'm executing and a total of 6000 objects Commented Jul 25, 2014 at 7:12

1 Answer 1

2

I have so far tried many things since the past 3 hours and my mind is spinning.

Not to be snide, not at all, but in 3 hours you could have written a recursive descent parser for JSON, or in about 30 minutes you could have installed JSON.NET, read the docs/samples and moved on to other things. Why not try that now? There is no future in parsing JSON with regex, because JSON is a context free language, which is recursive and potentially infinitely long and nested. Regex is DFA/NFA. It can't handle the CFG. Sort of like Parsing HTML with Regex (ok I couldn't resist)

Unless you have a very limited type of JSON and absolutely are against adding the 3rd party library, I wouldn't bother. Chalk it up to learning experience.

Sign up to request clarification or add additional context in comments.

8 Comments

I have tried Newtonsoft and we do have a custom JSON parser which performs better than Newtonsoft, however as I have mentioned both are consuming 6 and 4 secs respectively and hence I am looking for faster options and have turned to regex
Hmm.. the question is: will the regex be faster that well-written custom parser? I have a doubt. JSON looks simple, but recursion could be deadly.
Is your potential json input very limited? As in - can you reasonably describe the possible inputs in one page? Even so, I'd still consider hand-written recursive descent parser. It could be the part that is slow is the lookup / mapping to the object properties, not the actual parsing.
The JSON will simply return and array of objects with simple string properties and no nesting.
Hmm.. you have written that "so I don't know the schema". It looks like you know the schema, so I think that custom, well written parser will be more efficient that regex. (I think it is enought to parse JSON char-by-char in a single phase). Regex won't do it faster.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.