Spark scala - parse json from dataframe column and return RDD with columns

Question

I have a sparkScala RDD that looks like this :

df.printSchema()

 |-- stock._id: string (nullable = true)
 |-- stock.value: string (nullable = true)

Second column of the RDD is a nested JSON :

[ { ""warehouse"" : ""Type1"" , ""amount"" : ""0.0"" }, { ""warehouse"" : ""Type1"" , ""amount"" : ""25.0"" }]

I need to generate an RDD that will contain the existing two columns but also the columns from the JSON like:

_id, value , warehouse , amount

I've tried to do it using custom functions, but I'm struggling to apply this function to my RDD and getting the needed result

import org.json4s.jackson.JsonMethods._

import org.json4s._

 def extractWarehouses (value: String)  {
    val json = parse(value)
    for {
      JObject(warehouses) <- json
      JField("warehouse", JString(warehouse)) <- warehouses
      JField("amount", JDouble(amount)) <- warehouses
    } yield (warehouse, amount)
  }

Prasad Khode · Accepted Answer · 2017-02-08 06:02:51Z

1

As you said value is a json array which is holding list of json objects, you need to explode it and get individual properties as columns something like below:

import org.apache.spark.sql.functions

val flattenedDF = df.select(functions.column("_id"), functions.explode(df("value")).as("value"))
val result = flattenedDF.select("_id", "value.warehouse", "value.amount")
result.printSchema()

edited Feb 8, 2017 at 6:02

answered Feb 6, 2017 at 13:44

Prasad Khode

6,77712 gold badges48 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

colly434 Over a year ago

Unforntunately this doesn't work I got the error org.apache.spark.sql.AnalysisException: cannot resolve 'explode(value)' due to data type mismatch: input to function explode sho uld be array or map type, not StringType;

Collectives™ on Stack Overflow

Spark scala - parse json from dataframe column and return RDD with columns

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related