0

I have an array JSON as below format

{
  "marks": [
    {
      "subject": "Maths",
      "mark": "80"
    },
    {
      "subject": "Physics",
      "mark": "70"
    },
    {
      "subject": "Chemistry",
      "mark": "60"
    }
  ]
}

I need to split each array object as separate JSON files. Is there any way to do this in spark shell.

1 Answer 1

1

You can explode the marks array of structs, add an ID column, and write JSON files partitioned by the unique ID column.

df.selectExpr("inline(marks)")
  .withColumn("id", monotonically_increasing_id)
  .repartition(col("id"))
  .write
  .partitionBy("id")
  .json("output")
Sign up to request clarification or add additional context in comments.

1 Comment

Where is the df coming from ? What if I just have the json .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.