I'm fairly new to pyspark
My code what I have tried so far
ConceptGriddf = spark.sql("""
SELECT DataID
,collect_list(struct(ConceptGridName AS Title,
named_struct("ZoneName",ZoneName
,"Zone", Zone
,"ZoneScore", ZoneScore)as Zones
))ConceptGrid
FROM OutputTable
GROUP BY DataID """)
if dbutils.widgets.get("Debug") == 'yes':
display(ConceptGriddf.limit(10))
The table
DataID ConceptGridName Zone ZoneName ZoneScore
1 Reserved_89 y Shared 0.58115
1 Reserved_89 x Unshared 0.4939
Here's my current JSON output in the dataframe
"ConceptGrid": [
{
"Title": "Reserved_89",
"Zones": {
"ZoneName": "Shared",
"Zone": "y",
"ZoneScore": 0.58115
}
},
{
"Title": "Reserved_89",
"Zones": {
"ZoneName": "Unshared",
"Zone": "x",
"ZoneScore": 0.4939
}
}
]
Instead I want it to look something like this and I'm not able to achieve with the functions I'm using here
"ConceptGrid": [
{
"Title":"Reserved_89",
"Zones":[
{
"ZoneName":"Shared",
"Zone":"y",
"ZoneScore":0.58115
},
{
"ZoneName":"Unshared",
"Zone":"x",
"ZoneScore":0.4939
}
]
}
]