Im ingesting JSON files into spark and i have come across an object as below in the nested JSON from the file
"data": {
"key1" :"v1"
"key2" : [
{"nk1" :"nv1"},
{"nk2" :"nv2" },
{"nk3" :"nv3" }
]
}
After reading it in spark, it is changing into below format:
"data": {
"key1" :"v1"
"key2" : [
{"nk1" :"nv1", "nk2" :null, "nk3" :null},
{"nk1" :null, "nk2" :"nv2", "nk3" :null},
{"nk1" :null, "nk2" :null, "nk3" :"nv3"}
]
}
I need them as columns in the spark dataframe
| "key1" | "nk1" | "nk2" | "nk3" |
|---|---|---|---|
| "v1" | "kv1" | "kv2" | "kv3" |
Please help me with any solution for this. I'm thinking to convert this to string and use regex. Is there any better solution?