1

The following JSON contains a Nested attribute with the name “result”, which contains an array of Key-Value pairs.

{
"result": [
    [
        {
            "key": "projects.name",
            "value": "Project 1",
            "type": "TEXT"
        },
        {
            "key": "projects.status",
            "value": "Archived",
            "type": "ENUM"
        },
        {
            "key": "user_tasks.start_date",
            "value": "2021-07-08 11:59:34",
            "type": "DATETIME"
        },
        {
            "key": "user_tasks.name",
            "value": "Section 1",
            "type": "TEXT"
        },
        {
            "key": "track_user.duration",
            "value": "00:40:02",
            "type": "INT"
        },
        {
            "key": "project_sections.question_count",
            "value": "24",
            "type": "SMALLINT"
        },
        {
            "key": "project_sections.assigned_to_users",
            "value": "[email protected]",
            "type": "JSON"
        }
    ],
    [
        {
            "key": "projects.name",
            "value": "Project 2",
            "type": "TEXT"
        },
        {
            "key": "projects.status",
            "value": "Archived",
            "type": "ENUM"
        },
        {
            "key": "user_tasks.start_date",
            "value": "2021-07-08 11:59:34",
            "type": "DATETIME"
        },
        {
            "key": "user_tasks.name",
            "value": "Section 2",
            "type": "TEXT"
        },
        {
            "key": "track_user.duration",
            "value": "00:40:02",
            "type": "INT"
        },
        {
            "key": "project_sections.question_count",
            "value": "23",
            "type": "SMALLINT"
        },
        {
            "key": "project_sections.assigned_to_users",
            "value": "[email protected]",
            "type": "JSON"
        }
    ],
    [
        {
            "key": "projects.name",
            "value": "Project 3",
            "type": "TEXT"
        },
        {
            "key": "projects.status",
            "value": "Archived",
            "type": "ENUM"
        },
        {
            "key": "user_tasks.start_date",
            "value": "2021-07-20 21:30:00",
            "type": "DATETIME"
        },
        {
            "key": "user_tasks.name",
            "value": "Internal Due Date",
            "type": "TEXT"
        },
        {
            "key": "track_user.duration",
            "value": "21:22:49",
            "type": "INT"
        },
        {
            "key": "project_sections.question_count",
            "value": "0",
            "type": "SMALLINT"
        },
        {
            "key": "project_sections.assigned_to_users",
            "value": "[email protected]",
            "type": "JSON"
        }
    ]
}

Now, what I want is to expand this JSON, and have all the Keys in the Nested array section, like in the “Expected Output” section below using Spark SQL / Scala:

enter image description here

I tried using explode and pivot functions but it's not working properly.

1
  • if my answer solved your question, you can mark it "accepted" Commented Aug 7, 2021 at 15:07

1 Answer 1

1

I tried your problem here is the solution.

import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.expressions._
val DF= spark.read.json(spark.createDataset(json_ip::Nil))

display(
        
    DF.select(explode($"result"))
      .withColumn("r_num",row_number over(Window.orderBy($"col")))
      .withColumn("res_exp", explode($"col"))
      .drop($"col")
      .withColumn("all_row_values",$"res_exp.value")
      .withColumn("columns",$"res_exp.key")
      .drop("res_exp")
      .groupBy($"r_num")
      .pivot($"columns")
      .agg(first($"all_row_values"))
      .drop("r_num")
       )

output:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, Pradeep. This is exactly what I need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.