spark scala : Convert Array of Struct column to String column

Question

I have a column, which is of type array < Struct > deduced from json file. I want to convert the array < Struct > into string, so that i can keep this array column as-is in hive and export it to RDBMS as a single column.

temp.json

{"properties":{"items":[{"invoicid":{"value":"923659"},"job_id":
{"value":"296160"},"sku_id":
{"value":"312002"}}],"user_id":"6666","zip_code":"666"}}

Processing :

scala> val temp = spark.read.json("s3://check/1/temp1.json")
temp: org.apache.spark.sql.DataFrame = [properties: struct<items:
array<struct<invoicid:struct<value:string>,job_id:struct<value:string>,sku_id:struct<value:string>>>, user_id: string ... 1 more field>]

    scala> temp.printSchema
    root
     |-- properties: struct (nullable = true)
     |    |-- items: array (nullable = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- invoicid: struct (nullable = true)
     |    |    |    |    |-- value: string (nullable = true)
     |    |    |    |-- job_id: struct (nullable = true)
     |    |    |    |    |-- value: string (nullable = true)
     |    |    |    |-- sku_id: struct (nullable = true)
     |    |    |    |    |-- value: string (nullable = true)
     |    |-- user_id: string (nullable = true)
     |    |-- zip_code: string (nullable = true)


scala> temp.select("properties").show
+--------------------+
|          properties|
+--------------------+
|[WrappedArray([[9...|
+--------------------+


scala> temp.select("properties.items").show
+--------------------+
|               items|
+--------------------+
|[[[923659],[29616...|
+--------------------+


scala> temp.createOrReplaceTempView("tempTable")

scala> spark.sql("select properties.items  from tempTable").show
+--------------------+
|               items|
+--------------------+
|[[[923659],[29616...|
+--------------------+

How can i get the result like:

+-----------------------------------------------------------------------------------------+
|               items                                                                     |
+-----------------------------------------------------------------------------------------+
[{"invoicid":{"value":"923659"},"job_id":{"value":"296160"},"sku_id":{"value":"312002"}}] |
+-----------------------------------------------------------------------------------------+

to get the array element value without any change.

[{"invoicid":{"value":"923659"},"job_id":{"value":"296160"},"sku_id":{"value":"312002"}}] — Manoj Pandey
– Manoj Pandey, Commented Mar 1, 2019 at 12:52

zero323 · Accepted Answer · 2017-06-02 10:59:41Z

17

to_json is the function you're looking for

import org.apache.spark.sql.functions.to_json:

val df = spark.read.json(sc.parallelize(Seq("""
  {"properties":{"items":[{"invoicid":{"value":"923659"},"job_id":
  {"value":"296160"},"sku_id":
  {"value":"312002"}}],"user_id":"6666","zip_code":"666"}}""")))


df
  .select(get_json_object(to_json($"properties"), "$.items").alias("items"))
  .show(false)

+-----------------------------------------------------------------------------------------+
|items                                                                                    |
+-----------------------------------------------------------------------------------------+
|[{"invoicid":{"value":"923659"},"job_id":{"value":"296160"},"sku_id":{"value":"312002"}}]|
+-----------------------------------------------------------------------------------------+

edited Jun 2, 2017 at 10:59

answered Jun 2, 2017 at 10:51

zero323

331k108 gold badges981 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1384205 Over a year ago

How would you extract all the columns attached to the root struct? For eg, if "properties" did not exist, I was hoping select(get_json_object(to_json(($".*")),"$.value")) would work. But it doesnt.

zero323 Over a year ago

to_json(struct(df.columns map col: _*))

Collectives™ on Stack Overflow

spark scala : Convert Array of Struct column to String column

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related