Input data-frame:
{
"F1" : "A",
"F2" : "B",
"F3" : [
{
"name" : "N1",
"sf1" : "val_1",
"sf2" : "val_2"
},
{
"name" : "N2",
"sf1" : "val_3",
"sf2" : "val_4"
}
],
"F4" : {
"SF1" : "val_5",
"SF2" : "val_6",
"SF3" : "val_7"
}
}
Desired output:
[
{
"F1" : "A",
"F2" : "B",
"F3_name" : "N1",
"F3_sf1" : "val_1",
"F3_sf2" : "val_2",
"F4_SF1" : "val_7",
"F4_SF2" : "val_8",
"F4_SF3" : "val_9",
},
{
"F1" : "A",
"F2" : "B",
"F3_name" : "N2",
"F3_sf1" : "val_3",
"F3_sf2" : "val_4",
"F4_SF1" : "val_7",
"F4_SF2" : "val_8",
"F4_SF3" : "val_9",
}
]
F3 is an array of struct. The new data-frame is supposed to be flat and have converted this single row into one or more rows(2 in this example) based on the number of items in F3.
I am new to Spark & Scala. Any thought on how of achieve this transformation will be very helpful.
Thanks!