0

I'm reading an .avro file where the data of a particular column is in binary format. I'm currently converting the binary format to string format with the help of UDF for a readable purpose and then finally i will need to convert it into JSON format for further parsing the data. Is there a way i can convert string object to JSON format using Spark Scala code.

Any help would be much appreciated.

val avroDF = spark.read.format("com.databricks.spark.avro").
load("file:///C:/46.avro")

import org.apache.spark.sql.functions.udf

// Convert byte object to String format

val toStringDF = udf((x: Array[Byte]) => new String(x))


val newDF = avroDF.withColumn("BODY", 
toStringDF(avroDF("body"))).select("BODY")

Output of newDF is shown below:

BODY                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+---------------------------------------------------------------------------------------------------------------+
|{"VIN":"FU74HZ501740XXXXX","MSG_TYPE":"SIGNAL","TT":0,"RPM":[{"E":1566800008672,"V":1073.75},{"E":1566800002538,"V":1003.625},{"E":1566800004084,"V":1121.75}

My desired output should be like below: enter image description here

2
  • isn't your body is already in json format ? If you are looking for converting that json string to proper data frame then this may help. Commented Sep 6, 2019 at 6:26
  • Look at from_json to convert your JSON string to a proper dataframe (without having to re-read the data with spark.read.json()) Commented Sep 6, 2019 at 8:01

1 Answer 1

1

I do not know if you want a generic solution but in your particular case, you can code something like this:

spark.read.json(newDF.as[String])
    .withColumn("RPM", explode(col("RPM")))
    .withColumn("E", col("RPM.E"))
    .withColumn("V", col("RPM.V"))
    .drop("RPM")
    .show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.