2

I have 2 dataframes df1 and df2. df1 has 1 column key of type String

df1.show()

key
----
k1
k2
k3

df2 has 2 columns
df2.show()
topic | keys
-------------
 t1  | [k1, k2]
 t2  | [pk1, pk2]

I want to join the 2 dataframes when df1.key is present in df2.keys. I saw previous examples posted here Spark: Join dataframe column with an array

However, I am looking for a whole word match. Contains method is joining rows that have a partial match. What I mean is in the above example, I don't want k2 to be joined with [pk1, pk2] because array does not contain key k2, it contains pk2.

Can someone suggest how to join in this case ? Please provide example in JAVA.

2 Answers 2

6

Function "array_contains" can be used:

val df1 = List("k1", "k2", "k3").toDF("key")
val df2 = List(
  ("t1", Array("k1", "k2")),
  ("t2", Array("pk1", "pk2"))
).toDF("topic", "keys")
val result = df1.join(df2, expr("array_contains(keys,key)"))
result.show(false)

Output:

+---+-----+--------+
|key|topic|keys    |
+---+-----+--------+
|k1 |t1   |[k1, k2]|
|k2 |t1   |[k1, k2]|
+---+-----+--------+
Sign up to request clarification or add additional context in comments.

Comments

0

What you can do is explode your array and get one line per key like that :

df2 = df2.withColumn("key", explode(df2.col("keys")))
df2.show()

+-----+----------+---+
|topic|      keys|key|
+-----+----------+---+
|   t1|  [k1, k2]| k1|
|   t1|  [k1, k2]| k2|
|   t2|[pk1, pk2]|pk1|
|   t2|[pk1, pk2]|pk2|
+-----+----------+---+

Then you can join on this new column :

Dataset<Row> result = df2.join(key, df2.col("key").equalTo(df1.col("key")), "inner")
result.show()

+-----+--------+---+---+
|topic|    keys|key|key|
+-----+--------+---+---+
|   t1|[k1, k2]| k1| k1|
|   t1|[k1, k2]| k2| k2|
+-----+--------+---+---+

Note that it is not very efficient because it duplicates the data.

1 Comment

Yeah, I thought about explode. But is there a better alternative ? Can joins take something like UDF(List<String>, String) and return a DataType.Boolean which can be used as a join condition ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.