I have a structure like the following in orc/parquet format.
{
"Register": {
"Persons": [
{
"Name": "Name1",
"Age": 12,
"Address": [
{
"Apt": "Apt1"
}
],
"Phone": [
{
"PhoneNum": 1234
}
]
},
{
"Name": "Name2",
"Age": 14,
"Address": [
{
"Apt": "Apt2"
}
],
"Phone": [
{
"PhoneNum": 55555
}
]
}
]
}
}
I need to create a new DF based on condition Apt= Apt1 and Change Phone number of that entry to 7777. NB: Need to keep the same structure. I have tried out couple methods in scala-spark, but not able to update the nested array struct type. Any expert advise will be helpful.
Update: Following this link, i am able to get the named_struct variables. When it comes to array, I am not able to get the answer. https://kb.databricks.com/data/update-nested-column.html#how-to-update-nested-columns
df.printSchema()?