I'm parsing json files (size varies from few KB to few GB) that are built as an array of objects, for example:
{
"records": [
{
"col1": "someValue",
"col2": "someValue",
"col3": "someValue",
},
{
"col1": "someValue",
"col2": "someValue",
"col3": "someValue",
},
{
"col1": "someValue",
"col2": "someValue",
"col3": "someValue",
}
]
}
The records represent individual rows of data from a table and the file always contains data for one table only.
I can extract table's metadata and parse it without any issues. I'm using JSON.simple library to do this.
What I'm trying to do now, is to validate that all of the objects have the same keys, no more or less as data needs to be ingested into a table. I can extract the keys using keySet() method and put it into list but it seems like comparing one list to an another times amount of rows (from few to millions) is a very poor and costly implementation.
Is there some nice solution that could quickly compare all keys from all json objects in json array?