Get value from an array in a map based on a key in Scala

Question

I have a dataframe with the following schema:

 |-- A: map (nullable = true)
 |    |-- key: string
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- type: string (nullable = true)
 |    |    |    |-- index: boolean (nullable = false)
 |-- idkey: string (nullable = true)

Since the value in the map is of type array, I need to extract the field index corresponding to the id in the "foreign" key field idkey.

For example, I have the following data:

 {"A":{
 "innerkey_1":[{"id":"1","type":"0.01","index":true},
               {"id":"6","type":"4.3","index":false}]},
 "1"}

Since the idkey is 1, we need to to output the value of index corresponding to the element where "id":1, i.e. the index should be equal to true. I am really not sure how I can accomplish this, with UDFs or otherwise.

Expected output is:

+---------+
| indexout|
+---------+
|   true  |
+---------+

can you clarify i.e. the index should be equal to 0 ?? and can you share your expected output too — Anahcolus
– Anahcolus, Commented Mar 13, 2018 at 5:35
and how can 1 be a boolean value? and type struct seems to be double not string. ?? — Anahcolus
– Anahcolus, Commented Mar 13, 2018 at 5:44
index false has id 6 . they don't match idkey with id. the matching index should be true. — Anahcolus
– Anahcolus, Commented Mar 13, 2018 at 13:18
aren't these Since the idkey is 1, we need to to output the value of index corresponding to the element where "id":1, i.e. the index should be equal to false contradicting with each other? — Anahcolus
– Anahcolus, Commented Mar 13, 2018 at 13:59

Anahcolus · Accepted Answer · 2018-03-14 06:31:54Z

3

If your dataframe has following schema

root
 |-- A: map (nullable = true)
 |    |-- key: string
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- types: string (nullable = true)
 |    |    |    |-- index: boolean (nullable = false)
 |-- idkey: string (nullable = true)

then you can use two explode function, one for the map and other for the inner array, use a filter to filter the match and finally select the index as

import org.apache.spark.sql.functions._
df.select(col("idkey"), explode(col("A")))
  .select(col("idkey"), explode(col("value")).as("value"))
  .filter(col("idkey") === col("value.id"))
  .select(col("value.index").as("indexout"))

You should get

+--------+
|indexout|
+--------+
|true    |
+--------+

Using udf function

You can do the above by using a udf function which would avoid the two explode and a filter too. all of the explodes and filter is done in udf function itself. You can modify according to your needs.

import org.apache.spark.sql.functions._
def indexoutUdf = udf((a: Map[String, Seq[Row]], idkey: String) => {
  a.map(x => x._2.filter(y => y.getAs[String](0) == idkey).map(y => y.getAs[Boolean](2))).toList(0).head
})
df.select(indexoutUdf(col("A"), col("idkey")).as("indexout")).show(false)

I hope the answer is helpful

edited Mar 14, 2018 at 6:31

answered Mar 13, 2018 at 17:32

Anahcolus

42.1k6 gold badges75 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pramod Kumar Over a year ago

Is there a way to do it other than using explode? I considered it but it will be too expensive for large dataframes.

Anahcolus Over a year ago

@PramodKumar, I have updated the answer :) I hope the answer is going to be upvoted and accepted this time ;)

Collectives™ on Stack Overflow

Get value from an array in a map based on a key in Scala

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related