-1

I have the following DataFrame:

DF1:
+------+---------+
|key1  |Value    |
+------+---------+
|[k, l]|      1  |
|[m, n]|      2  |
|[o]   |      3  |
+------+---------+

that needs to be 'joined' with another dataframe

DF2:
+----+
|key2|
+----+
|k   |
|l   |
|m   |
|n   |
|o   |
+----+

so that the output looks like this:

DF3:
+--------------------+---------+
|key3                |Value    |
+--------------------+---------+
|k:1 l:1 m:0 n:0 o:0 |      1  |
|k:0 l:0 m:1 n:1 o:0 |      2  |
|k:0 l:0 m:0 n:0 o:1 |      3  |
+--------------------+---------+

In other words, the output dataframe should have a column that is a string of all rows in DF2, and each element should be followed by a 1 or 0 indicating whether that element was present in the list in the column key1 of DF1.

I am not sure how to go about it. Is there a simple UDF I can write to accomplish what I want?

1 Answer 1

3

For operation like this to be possible DF2 so you can just use udf:

import spark.implicits._
import org.apache.spark.sql.functions._

val df1 = Seq(
  (Seq("k", "l"), 1), (Seq("m", "n"), 2), (Seq("o"), 3)
).toDF("key1", "value")
val df2 = Seq("k", "l", "m", "n", "o").toDF("key2")

val keys = df2.as[String].collect.map((_, 0)).toMap

val toKeyMap = udf((xs: Seq[String]) => 
   xs.foldLeft(keys)((acc, x) => acc + (x -> 1)))


df1.select(toKeyMap($"key1").alias("key3"), $"value").show(false)

// +-------------------------------------------+-----+
// |key3                                       |value|
// +-------------------------------------------+-----+
// |Map(n -> 0, m -> 0, l -> 1, k -> 1, o -> 0)|1    |
// |Map(n -> 1, m -> 1, l -> 0, k -> 0, o -> 0)|2    |
// |Map(n -> 0, m -> 0, l -> 0, k -> 0, o -> 1)|3    |
// +-------------------------------------------+-----+

If you want just a string:

val toKeyMapString = udf((xs: Seq[String]) => 
   xs.foldLeft(keys)((acc, x) => acc + (x -> 1))
     .map { case (k, v) => s"$k: $v" }
     .mkString(" ")
)


df1.select(toKeyMapString($"key1").alias("key3"), $"value").show(false)
// +------------------------+-----+
// |key3                    |value|
// +------------------------+-----+
// |n: 0 m: 0 l: 1 k: 1 o: 0|1    |
// |n: 1 m: 1 l: 0 k: 0 o: 0|2    |
// |n: 0 m: 0 l: 0 k: 0 o: 1|3    |
// +------------------------+-----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.