I'm working in a Python 3 notebook in Azure Databricks with Spark 3.0.1.
I have the following DataFrame
+---+---------+
|ID |Name |
+---+---------+
|1 |John |
|2 |Michael |
+---+---------+
Which can be created with this code
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [(1,"John","Doe"),
(2,"Michael","Douglas")
]
schema = StructType([ \
StructField("ID",IntegerType(),True), \
StructField("Name",StringType(),True), \
])
df1 = spark.createDataFrame(data=data2,schema=schema)
df1.show(truncate=False)
I am trying to transform it into an object which can be serialized into json with a single property called Entities which is an array of the elements in the DataFrame.
Like this
{
"Entities": [
{
"ID": 1,
"Name": "John"
},
{
"ID": 2,
"Name": "Michael"
}
]
}
I've been trying to figure out how to do it but haven't had any luck so far. Can anyone point me in the right direction please?