0

How can I split an Array of Json DataFrame to multiple rows in Spark-Scala

Input DataFrame :

                +----------+-------------+-----------------------------------------------------------------------------------------------------------------------------+
                |item_id   |s_tag  |jsonString                                                                                                                         |
                +----------+-------------+-----------------------------------------------------------------------------------------------------------------------------+
                |Item_12345|S_12345|[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]      |
                +----------+-------------+-----------------------------------------------------------------------------------------------------------------------------+


            Output DataFrame :
                +----------+-------------------------------------------------+
                |item_id   |s_tag  |jsonString                               |
                +----------+-------------------------------------------------+
                |Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
                +----------+-------------------------------------------------+
                |Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
                +----------+-------------------------------------------------+
                |Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
                +----------+-------------------------------------------------+

This is what I have tried so far but it did not work

val rawDF = sparkSession
      .sql("select 1")
      .withColumn("item_id", lit("Item_12345")).withColumn("s_tag", lit("S_12345"))
      .withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]""")) 

    val newDF = RawDF.withColumn("splittedJson", explode(RawDF.col("jsonString")))
0

1 Answer 1

2

The issue in the example code you posted is that the json is represented as a string and hence cannot be exploded. Try something like this:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{typedLit, _}


object tmp {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[1]").getOrCreate()

    val arr = Seq("{\"First\":{\"Info\":\"ABCD123\",\"Res\":\"5.2\"}}",
      "{\"Second\":{\"Info\":\"ABCD123\",\"Res\":\"5.2\"}}",
      "{\"Third\":{\"Info\":\"ABCD123\",\"Res\":\"5.2\"}}")
    val rawDF = spark.sql("select 1")
      .withColumn("item_id", lit("Item_12345"))
      .withColumn("s_tag", lit("S_12345"))
      .withColumn("jsonString", typedLit(arr))
    val newDF = rawDF.withColumn("splittedJson", explode(rawDF.col("jsonString")))

    newDF.show()
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @darkCode, Thank you so much. Solution is giving the exact output and solved my issue.
No problem @Lucky, could you mark the answer as correct if it worked, thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.