2

I have the following JSON object returned from a JSON differ:

{
    "lastName": ["Bab", "Beb"],
    "middleName": ["Cg", "seeg"],
    "contact":
    {
        "emailAddress": ["[email protected]", "[email protected]"],
        "addresses":
        [
            {
                "state": ["AL", "AZ"]
            },
            {
                "state": ["TN", "MO"]
            }
        ]
    }
}

I need a list of changes in the following fashion.

lastName/new:Bab/old:Beb
middleName/new:Cg/old:seeg
contact.emailAddress/new:[email protected]/old:[email protected]
contact.addresses[0].state/new:AL/old:AZ
contact.addresses[1].state/new:TN/old:MO

So I wrote this ugly program using a bit of recursion.

private static IEnumerable<DocumentProperty> ParseJObject(JObject node)
{
    HashSet<DocumentProperty> documentProperties = new HashSet<DocumentProperty>();
    DocumentProperty documentProperty = new DocumentProperty();

    foreach (KeyValuePair<string, JToken> sub in node)
    {
        if (sub.Value.Type == JTokenType.Array)
        {
            // unnamed nodes which contain nested objects 
            if (sub.Value.First.Type == JTokenType.Object)
            {
                foreach (var innerNode in sub.Value.Children())
                {
                    documentProperties.UnionWith(ParseJObject((JObject)innerNode));
                }
            }

            documentProperty = CreateDocumentProperty(sub.Value);
        }
        else if (sub.Value.Type == JTokenType.Object)
        {
            documentProperties.UnionWith(ParseJObject((JObject)sub.Value));
        }

        documentProperties.Add(documentProperty);
    }

    return documentProperties;
}

It worked except that it is getting me some extra output.

lastName/new:Bab/old:Beb
middleName/new:Cg/old:seeg
contact.emailAddress/new:[email protected]/old:[email protected]
contact.addresses[0].state/new:AL/old:AZ
contact.addresses[1].state/new:TN/old:MO
contact.addresses/new:{                <-----------------------------Extra here.
  "state": [
    "AL",
    "AZ"
  ]
}/old:{
  "state": [
    "TN",
    "MO"
  ]
}

I suspect that it is due to how I have my recursion setup. Can you immediately make out what is wrong here?

Definition for CreateDocumentProperty

private static DocumentProperty CreateDocumentProperty(JToken subValue) => new DocumentProperty()
{
    PropertyName = subValue.Path,
    New = subValue[0].ToString(),
    Old = subValue[1].ToString()
};

Main method:

static void Main()
{
    JToken jToken = JToken.Parse("{\"lastName\":[\"Bab\",\"Beb\"],\"middleName\":[\"Cg\",\"seeg\"],\"contact\":{\"emailAddress\":[\"[email protected]\",\"[email protected]\"],\"addresses\":[{\"state\":[\"AL\",\"AZ\"]},{\"state\":[\"TN\",\"MO\"]}],}}");

    JObject inner = jToken.Value<JObject>();
    IEnumerable<DocumentProperty> data = ParseJObject(inner);

    foreach (var item in data) Console.WriteLine(item);
}
3
  • 1
    Is there a reason you can't just create a class(Model) and deserialize the JSON string into the Model? Commented Oct 3, 2019 at 20:58
  • If it was a single type of JSON document, I would have done that and not resorted to this. These diffs come from Cosmos DB with a wide variety of JSON documents stored. Commented Oct 3, 2019 at 21:01
  • I still don't understand why you can't simply use JSON.NET to deserialize the JSON object that you've provided in your question. Create a model (or models) to fit your JSON schema and JsonConvert.DeserializeObject<MyModel>(jsonData). Commented Oct 3, 2019 at 21:17

1 Answer 1

2

Rather than writing your own recursive code, you can use JContainer.DescendantsAndSelf() to find all new value/old value pairs, then transform then into a string with the required formatting using LINQ.

First, define the following extension method:

public static IEnumerable<string> GetDiffPaths(this JContainer root)
{
    if (root == null)
        throw new ArgumentNullException(nameof(root));
    var query = from array in root.DescendantsAndSelf().OfType<JArray>()
                where array.Count == 2 && array[0] is JValue && array[1] is JValue
                select $"{array.Path}/new:{array[0]}/old:{array[1]}";
    return query;
}

And then do:

var jContainer = jToken as JContainer;
if (jContainer == null)
    throw new JsonException("Input was not a container");

foreach (var item in jContainer.GetDiffPaths())
{
    Console.WriteLine(item);
}

Demo fiddle here.

Notes:

  1. In the above code I am simply generating an enumerable of strings, but you could replace that with an enumerable of DocumentProperty objects (which was not fully included in your question).

  2. My assumption is that any JSON array containing two exactly two primitive values represents a new value / old value pair.

    In your code checking for this isn't done correctly. Specifically, I believe at the minimum an else is missing in the following location:

    foreach (KeyValuePair<string, JToken> sub in node)
    {
        if (sub.Value.Type == JTokenType.Array)
        {
            // unnamed nodes which contain nested objects 
            if (sub.Value.First.Type == JTokenType.Object)
            {
                foreach (var innerNode in sub.Value.Children())
                {
                    documentProperties.UnionWith(ParseJObject((JObject)innerNode));
                }
            }
            else // ELSE WAS REQUIRED HERE
            {
                documentProperties.Add(CreateDocumentProperty(sub.Value));
            }
        }
        else if (sub.Value.Type == JTokenType.Object)
        {
            documentProperties.UnionWith(ParseJObject((JObject)sub.Value));
        }
    }
    

    Demo fiddle #2 here.

  3. JContainer represents either a JSON array or a JSON object. My assumption is that the diff routine must return one or the other.

Sign up to request clarification or add additional context in comments.

2 Comments

Your assumption 2 is spot on. I used JSONDiffPatch.Net library which returns an array with new and old values for every property that has a difference between the old and new entities.
Also, adding else has solved the issue for this data point, but it still returns some garbage output for few of document types which had more nesting. My preliminary tests of the DescendantsAndSelf approach seems promising. Thanks for the time and effort you have placed in this solution. I will mark this as answer and continue to tune this for my needs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.