Below is my input schema.
root
|-- Date: string (nullable = true)
|-- tax: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- ID: string (nullable = true)
| | |-- hst: double (nullable = true)
| | |-- gst: double (nullable = true)
My required output is as below.
+-----------+---------+------------------+----------------------+
| DATE | ID |hst | gst |
+-----------+---------+------------------+----------------------+
|14-Jun-2022| 1 | 100.0| 12.0|
|14-Jun-2022| 2 | 80.0| 6.0|
+-----------+---------+------------------+----------------------+
I am using the below code to explode and select the columns from the exploded result.
df = sourceDF.withColumn("exploded_column", explode(col("tax")))
explodeddf = testdf.select(col("DATE"),col("exploded_column.ID"),col("exploded_column.hst"),col("exploded_column.gst"))
I was able to get the output. But I want to know if there is any other way to achieve this if my source dataframe changes frequently.
Thank you.