Convert or flatten a JSON having nested data with struct/array to columns

Question

The following JSON contains a Nested attribute with the name “result”, which contains an array of Key-Value pairs.

{
"result": [
    [
        {
            "key": "projects.name",
            "value": "Project 1",
            "type": "TEXT"
        },
        {
            "key": "projects.status",
            "value": "Archived",
            "type": "ENUM"
        },
        {
            "key": "user_tasks.start_date",
            "value": "2021-07-08 11:59:34",
            "type": "DATETIME"
        },
        {
            "key": "user_tasks.name",
            "value": "Section 1",
            "type": "TEXT"
        },
        {
            "key": "track_user.duration",
            "value": "00:40:02",
            "type": "INT"
        },
        {
            "key": "project_sections.question_count",
            "value": "24",
            "type": "SMALLINT"
        },
        {
            "key": "project_sections.assigned_to_users",
            "value": "[email protected]",
            "type": "JSON"
        }
    ],
    [
        {
            "key": "projects.name",
            "value": "Project 2",
            "type": "TEXT"
        },
        {
            "key": "projects.status",
            "value": "Archived",
            "type": "ENUM"
        },
        {
            "key": "user_tasks.start_date",
            "value": "2021-07-08 11:59:34",
            "type": "DATETIME"
        },
        {
            "key": "user_tasks.name",
            "value": "Section 2",
            "type": "TEXT"
        },
        {
            "key": "track_user.duration",
            "value": "00:40:02",
            "type": "INT"
        },
        {
            "key": "project_sections.question_count",
            "value": "23",
            "type": "SMALLINT"
        },
        {
            "key": "project_sections.assigned_to_users",
            "value": "[email protected]",
            "type": "JSON"
        }
    ],
    [
        {
            "key": "projects.name",
            "value": "Project 3",
            "type": "TEXT"
        },
        {
            "key": "projects.status",
            "value": "Archived",
            "type": "ENUM"
        },
        {
            "key": "user_tasks.start_date",
            "value": "2021-07-20 21:30:00",
            "type": "DATETIME"
        },
        {
            "key": "user_tasks.name",
            "value": "Internal Due Date",
            "type": "TEXT"
        },
        {
            "key": "track_user.duration",
            "value": "21:22:49",
            "type": "INT"
        },
        {
            "key": "project_sections.question_count",
            "value": "0",
            "type": "SMALLINT"
        },
        {
            "key": "project_sections.assigned_to_users",
            "value": "[email protected]",
            "type": "JSON"
        }
    ]
}

Now, what I want is to expand this JSON, and have all the Keys in the Nested array section, like in the “Expected Output” section below using Spark SQL / Scala:

I tried using explode and pivot functions but it's not working properly.

if my answer solved your question, you can mark it "accepted" — Pradeep yadav
– Pradeep yadav, Commented Aug 7, 2021 at 15:07

Pradeep yadav · Accepted Answer · 2021-08-07 10:26:12Z

1

I tried your problem here is the solution.

import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.expressions._
val DF= spark.read.json(spark.createDataset(json_ip::Nil))

display(
        
    DF.select(explode($"result"))
      .withColumn("r_num",row_number over(Window.orderBy($"col")))
      .withColumn("res_exp", explode($"col"))
      .drop($"col")
      .withColumn("all_row_values",$"res_exp.value")
      .withColumn("columns",$"res_exp.key")
      .drop("res_exp")
      .groupBy($"r_num")
      .pivot($"columns")
      .agg(first($"all_row_values"))
      .drop("r_num")
       )

output:

answered Aug 7, 2021 at 10:26

Pradeep yadav

2161 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3279174 Over a year ago

Thank you very much, Pradeep. This is exactly what I need.

Collectives™ on Stack Overflow

Convert or flatten a JSON having nested data with struct/array to columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related