Recursive JSON parsing using JSON.NET

Question

I have the following JSON object returned from a JSON differ:

{
    "lastName": ["Bab", "Beb"],
    "middleName": ["Cg", "seeg"],
    "contact":
    {
        "emailAddress": ["[email protected]", "[email protected]"],
        "addresses":
        [
            {
                "state": ["AL", "AZ"]
            },
            {
                "state": ["TN", "MO"]
            }
        ]
    }
}

I need a list of changes in the following fashion.

lastName/new:Bab/old:Beb
middleName/new:Cg/old:seeg
contact.emailAddress/new:[email protected]/old:[email protected]
contact.addresses[0].state/new:AL/old:AZ
contact.addresses[1].state/new:TN/old:MO

So I wrote this ugly program using a bit of recursion.

private static IEnumerable<DocumentProperty> ParseJObject(JObject node)
{
    HashSet<DocumentProperty> documentProperties = new HashSet<DocumentProperty>();
    DocumentProperty documentProperty = new DocumentProperty();

    foreach (KeyValuePair<string, JToken> sub in node)
    {
        if (sub.Value.Type == JTokenType.Array)
        {
            // unnamed nodes which contain nested objects 
            if (sub.Value.First.Type == JTokenType.Object)
            {
                foreach (var innerNode in sub.Value.Children())
                {
                    documentProperties.UnionWith(ParseJObject((JObject)innerNode));
                }
            }

            documentProperty = CreateDocumentProperty(sub.Value);
        }
        else if (sub.Value.Type == JTokenType.Object)
        {
            documentProperties.UnionWith(ParseJObject((JObject)sub.Value));
        }

        documentProperties.Add(documentProperty);
    }

    return documentProperties;
}

It worked except that it is getting me some extra output.

lastName/new:Bab/old:Beb
middleName/new:Cg/old:seeg
contact.emailAddress/new:[email protected]/old:[email protected]
contact.addresses[0].state/new:AL/old:AZ
contact.addresses[1].state/new:TN/old:MO
contact.addresses/new:{                <-----------------------------Extra here.
  "state": [
    "AL",
    "AZ"
  ]
}/old:{
  "state": [
    "TN",
    "MO"
  ]
}

I suspect that it is due to how I have my recursion setup. Can you immediately make out what is wrong here?

Definition for CreateDocumentProperty

private static DocumentProperty CreateDocumentProperty(JToken subValue) => new DocumentProperty()
{
    PropertyName = subValue.Path,
    New = subValue[0].ToString(),
    Old = subValue[1].ToString()
};

Main method:

static void Main()
{
    JToken jToken = JToken.Parse("{\"lastName\":[\"Bab\",\"Beb\"],\"middleName\":[\"Cg\",\"seeg\"],\"contact\":{\"emailAddress\":[\"[email protected]\",\"[email protected]\"],\"addresses\":[{\"state\":[\"AL\",\"AZ\"]},{\"state\":[\"TN\",\"MO\"]}],}}");

    JObject inner = jToken.Value<JObject>();
    IEnumerable<DocumentProperty> data = ParseJObject(inner);

    foreach (var item in data) Console.WriteLine(item);
}

Is there a reason you can't just create a class(Model) and deserialize the JSON string into the Model? — Ryan Wilson
– Ryan Wilson, Commented Oct 3, 2019 at 20:58
If it was a single type of JSON document, I would have done that and not resorted to this. These diffs come from Cosmos DB with a wide variety of JSON documents stored. — Animesh D
– Animesh D, Commented Oct 3, 2019 at 21:01
I still don't understand why you can't simply use JSON.NET to deserialize the JSON object that you've provided in your question. Create a model (or models) to fit your JSON schema and JsonConvert.DeserializeObject<MyModel>(jsonData). — Tom Faltesek
– Tom Faltesek, Commented Oct 3, 2019 at 21:17

dbc · Accepted Answer · 2019-10-03 22:16:35Z

2

Rather than writing your own recursive code, you can use JContainer.DescendantsAndSelf() to find all new value/old value pairs, then transform then into a string with the required formatting using LINQ.

First, define the following extension method:

public static IEnumerable<string> GetDiffPaths(this JContainer root)
{
    if (root == null)
        throw new ArgumentNullException(nameof(root));
    var query = from array in root.DescendantsAndSelf().OfType<JArray>()
                where array.Count == 2 && array[0] is JValue && array[1] is JValue
                select $"{array.Path}/new:{array[0]}/old:{array[1]}";
    return query;
}

And then do:

var jContainer = jToken as JContainer;
if (jContainer == null)
    throw new JsonException("Input was not a container");

foreach (var item in jContainer.GetDiffPaths())
{
    Console.WriteLine(item);
}

Demo fiddle here.

Notes:

In the above code I am simply generating an enumerable of strings, but you could replace that with an enumerable of DocumentProperty objects (which was not fully included in your question).

My assumption is that any JSON array containing two exactly two primitive values represents a new value / old value pair.

In your code checking for this isn't done correctly. Specifically, I believe at the minimum an else is missing in the following location:

foreach (KeyValuePair<string, JToken> sub in node)
{
    if (sub.Value.Type == JTokenType.Array)
    {
        // unnamed nodes which contain nested objects 
        if (sub.Value.First.Type == JTokenType.Object)
        {
            foreach (var innerNode in sub.Value.Children())
            {
                documentProperties.UnionWith(ParseJObject((JObject)innerNode));
            }
        }
        else // ELSE WAS REQUIRED HERE
        {
            documentProperties.Add(CreateDocumentProperty(sub.Value));
        }
    }
    else if (sub.Value.Type == JTokenType.Object)
    {
        documentProperties.UnionWith(ParseJObject((JObject)sub.Value));
    }
}

Demo fiddle #2 here.

JContainer represents either a JSON array or a JSON object. My assumption is that the diff routine must return one or the other.

edited Oct 3, 2019 at 22:16

answered Oct 3, 2019 at 21:49

dbc

120k27 gold badges273 silver badges404 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Animesh D Over a year ago

Your assumption 2 is spot on. I used JSONDiffPatch.Net library which returns an array with new and old values for every property that has a difference between the old and new entities.

Animesh D Over a year ago

Also, adding else has solved the issue for this data point, but it still returns some garbage output for few of document types which had more nesting. My preliminary tests of the DescendantsAndSelf approach seems promising. Thanks for the time and effort you have placed in this solution. I will mark this as answer and continue to tune this for my needs.

Collectives™ on Stack Overflow

Recursive JSON parsing using JSON.NET

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related