How to split Array of Json DataFrame into multiple possible number of rows in Scala

Question

How can I split an Array of Json DataFrame to multiple rows in Spark-Scala

Input DataFrame :

                +----------+-------------+-----------------------------------------------------------------------------------------------------------------------------+
                |item_id   |s_tag  |jsonString                                                                                                                         |
                +----------+-------------+-----------------------------------------------------------------------------------------------------------------------------+
                |Item_12345|S_12345|[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]      |
                +----------+-------------+-----------------------------------------------------------------------------------------------------------------------------+


            Output DataFrame :
                +----------+-------------------------------------------------+
                |item_id   |s_tag  |jsonString                               |
                +----------+-------------------------------------------------+
                |Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
                +----------+-------------------------------------------------+
                |Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
                +----------+-------------------------------------------------+
                |Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
                +----------+-------------------------------------------------+

This is what I have tried so far but it did not work

val rawDF = sparkSession
      .sql("select 1")
      .withColumn("item_id", lit("Item_12345")).withColumn("s_tag", lit("S_12345"))
      .withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]""")) 

    val newDF = RawDF.withColumn("splittedJson", explode(RawDF.col("jsonString")))

darkCode · Accepted Answer · 2020-03-31 10:41:45Z

2

The issue in the example code you posted is that the json is represented as a string and hence cannot be exploded. Try something like this:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{typedLit, _}


object tmp {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[1]").getOrCreate()

    val arr = Seq("{\"First\":{\"Info\":\"ABCD123\",\"Res\":\"5.2\"}}",
      "{\"Second\":{\"Info\":\"ABCD123\",\"Res\":\"5.2\"}}",
      "{\"Third\":{\"Info\":\"ABCD123\",\"Res\":\"5.2\"}}")
    val rawDF = spark.sql("select 1")
      .withColumn("item_id", lit("Item_12345"))
      .withColumn("s_tag", lit("S_12345"))
      .withColumn("jsonString", typedLit(arr))
    val newDF = rawDF.withColumn("splittedJson", explode(rawDF.col("jsonString")))

    newDF.show()
  }
}

answered Mar 31, 2020 at 10:41

darkCode

1409 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Lucky Over a year ago

Hi @darkCode, Thank you so much. Solution is giving the exact output and solved my issue.

darkCode Over a year ago

No problem @Lucky, could you mark the answer as correct if it worked, thanks

Collectives™ on Stack Overflow

How to split Array of Json DataFrame into multiple possible number of rows in Scala

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related