1

I have an API that currently receives JSON calls that I push to files (800KB-1MB) (1 for each call), and would like to have an hourly task that takes all of the JSON files in the last hour and combines them into a single file as to make it better to do daily/monthly analytics on.

Each file consists of a collection of data, so in the format of [ object {property: value, ... ]. Due to this, I cannot do simple concatenation as it'll no longer be valid JSON (nor add a comma then the file will be a collection of collections). I would like to keep the memory foot-print as low as possible, so I was looking at the following example and just pushing each file to the stream (deserializing the file using JsonConvert.DeserializeObject(fileContent); however, by doing this, I end up with a collection of collection as well. I have also tried using a JArray instead of the JsonConvert, pushing to a list outside of the foreach with but provides the same result. If I move the Serialize call outside the ForEach, it does work; however, I am worried about holding the 4-6GB worth of items in memory.

In summary, I'm ending up with [ [ object {property: value, ... ],... [ object {property: value, ... ]] where my desired output would be [ object {property: value (file1), ... object {property: value (fileN) ].

        using (FileStream fs = File.Open(@"C:\Users\Public\Documents\combined.json", FileMode.CreateNew))
        {
            using (StreamWriter sw = new StreamWriter(fs))
            {
                using (JsonWriter jw = new JsonTextWriter(sw))
                {
                    jw.Formatting = Formatting.None;

                    JArray list = new JArray();
                    JsonSerializer serializer = new JsonSerializer();

                    foreach (IListBlobItem blob in blobContainer.ListBlobs(prefix: "SharePointBlobs/"))
                    {
                        if (blob.GetType() == typeof(CloudBlockBlob))
                        {
                            var blockBlob = (CloudBlockBlob)blob;
                            var content = blockBlob.DownloadText();
                            var deserialized = JArray.Parse(content);
                            //deserialized = JsonConvert.DeserializeObject(content);
                            list.Merge(deserialized);
                            serializer.Serialize(jw, list);
                        }
                        else
                        {
                            Console.WriteLine("Non-Block-Blob: " + blob.StorageUri);
                        }
                    }
                }
            }
        }

1 Answer 1

1

In this situation, to keep your processing and memory footprints low, I think I would just concatenate the files one after the other even though it results in technically invalid JSON. To deserialize the combined file later, you can take advantage of the SupportMultipleContent setting on the JsonTextReader class and process the object collections through a stream as if they were one whole collection. See this answer for an example of how to do this.

Sign up to request clarification or add additional context in comments.

2 Comments

That looks exactly what I need - I am surprised this issue is as common enough that they actually made that setting! Thank you.
I was also surprised when I first came across it, but it has come in handy more than once.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.