I have a JSON structure with multiple nested arrays, like this:
{
"id": 10,
"packages": [
{
"packageId": 1010,
"clusters": [
{
"fieldClusterId": 101010,
"fieldDefinitions": [
{
"fieldId": 101011112999,
"fieldName": "EntityId"
}
]
}
]
}
]
}
I'm using spark sql to flatten the array to something like this:
| id | packageId | fieldClusterId | fieldId | fieldName |
|---|---|---|---|---|
| 10 | 1010 | 101010 | 101011112999 | EntityId |
The query ends up being a fairly ugly spark-sql cte with multiple steps:
%sql
with cte as(
select
id
explode(packages) as packages_exploded
from temp),
cte2 as (
select
id,
packages_exploded.packageId,
explode(packages_exploded.clusters) as col
from cte),
cte3 as (
select
id,
packageId,
col.fieldClusterId
explode(col.fieldDefinitions) as col
from cte2)
select
productId,
productName,
fieldClusterId,
fieldClusterName,
col.*
from cte3
Is there a nicer syntax to accomplish this multiple level explosion?