3

I'm parsing json files (size varies from few KB to few GB) that are built as an array of objects, for example:

{  
  "records": [
    {
       "col1": "someValue",
       "col2": "someValue",
       "col3": "someValue", 
    },
    {
       "col1": "someValue",
       "col2": "someValue",
       "col3": "someValue",
    },
    {
       "col1": "someValue",
       "col2": "someValue",
       "col3": "someValue",
    }  
  ]
}

The records represent individual rows of data from a table and the file always contains data for one table only.

I can extract table's metadata and parse it without any issues. I'm using JSON.simple library to do this.

What I'm trying to do now, is to validate that all of the objects have the same keys, no more or less as data needs to be ingested into a table. I can extract the keys using keySet() method and put it into list but it seems like comparing one list to an another times amount of rows (from few to millions) is a very poor and costly implementation.

Is there some nice solution that could quickly compare all keys from all json objects in json array?

2
  • Do you know the required column names beforehand? Commented Sep 30, 2016 at 11:31
  • No, I don't know column names. Commented Sep 30, 2016 at 11:32

1 Answer 1

1

You can not avoid having to look at every key (amount n) of every row (amount m) of your data, so the complexity can not go below O(n * m), anyways.

Sign up to request clarification or add additional context in comments.

2 Comments

Yeah, sure, I do need to look at every key in every object, but what will be the best possible approach? I was hoping that there would be a method in one of the json libraries that would allow to validate keys..?
or maybe not validate but compare as validation would require a schema.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.