0

Have data in CSV file below is the format. Want to split JSON from Desc column and create a new column with key.Using spark 2 with Scala.

+------+------------+----------------------------------+
|  id  |  Category  |           Desc                   |
+------+------------+----------------------------------+
|  201 |  MIS20     | { "Total": 200,"Defective": 21 } |
+------+-----------------------------------------------+
|  202 |  MIS30     | { "Total": 740,"Defective": 58 } |
+------+-----------------------------------------------+

Output :

So the desired output would be:

+------+------------+---------+-------------+
|  id  |  Category  |  Total  |  Defective  |
+------+------------+---------+-------------+
|  201 |  MIS20     |  200    |   21        |
+------+----------------------+-------------+
|  202 |  MIS30     |  740    |   58        | 
+------+------------------------------------+

Any help is highly appreciated.

0

1 Answer 1

1

Create a schema for your inner json and apply that schema with from_json function as below

val schema = new StructType()
  .add(StructField("Total", LongType, false)).
  add("Defective", LongType, false)

d.select($"id",$"Category", from_json($"Desc", schema).as("desc"))
  .select($"id",$"Category", $"desc.*")
  .show(false)

Output:

+---+--------+-----+---------+
|id |Category|Total|Defective|
+---+--------+-----+---------+
|201|MIS20   |200  |21       |
|202|MIS30   |740  |58       |
+---+--------+-----+---------+

Hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.