How to parse json string to different columns in spark scala?

Question

While reading parquet file this is the following file data

|id |name |activegroup|

|1  |abc  |[{"groupID":"5d","role":"admin","status":"A"},{"groupID":"58","role":"admin","status":"A"}]|

data types of each field

root

|--id : int
|--name : String
|--activegroup : String

activegroup column is string explode function is not working. Following is the required output

|id |name |groupID|role|status|
|1  |abc  |5d     |admin|A    |
|1  |def  |58     |admin|A    |

Do help me with parsing the above in spark scala latest version

How to solve it for spark 2.3 version?

Etisha
– Etisha

2021-07-26 11:03:49 +00:00
Commented Jul 26, 2021 at 11:03 — Etisha
– Etisha, Commented Jul 26, 2021 at 11:03

SCouto · Accepted Answer · 2020-04-12 16:34:45Z

4

First you need to extract the json schema:

  val schema = schema_of_json(lit(df.select($"activeGroup").as[String].first))

Once you got it, you can convert your activegroup column, which is a String to json (from_json), and then explode it.

Once the column is a json, you can extract it's values with $"columnName.field"

  val dfresult = df.withColumn("jsonColumn", explode(
                                      from_json($"activegroup", schema)))
                   .select($"id", $"name",
                           $"jsonColumn.groupId" as "groupId", 
                           $"jsonColumn.role" as "role", 
                           $"jsonColumn.status" as "status")

If you want to extract the whole json and the element names are ok to you you can use the * to do it:

val dfresult = df.withColumn("jsonColumn", explode(
                               from_json($"activegroup", schema)))
            .select($"id", $"name", $"jsonColumn.*")

RESULT

+---+----+-------+-----+------+
| id|name|groupId| role|status|
+---+----+-------+-----+------+
|  1| abc|     5d|admin|     A|
|  1| abc|     58|admin|     A|
+---+----+-------+-----+------+

answered Apr 12, 2020 at 16:34

SCouto

7,9465 gold badges37 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Etisha Over a year ago

This is not working with spark.sql 2.3 version. Can someone help me to solve from this version @SCouto

SCouto Over a year ago

it should work, I answer this with spark 2.4 i think, which is your error?

Collectives™ on Stack Overflow

How to parse json string to different columns in spark scala?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related