I have the following DataFrame:
DF1:
+------+---------+
|key1 |Value |
+------+---------+
|[k, l]| 1 |
|[m, n]| 2 |
|[o] | 3 |
+------+---------+
that needs to be 'joined' with another dataframe
DF2:
+----+
|key2|
+----+
|k |
|l |
|m |
|n |
|o |
+----+
so that the output looks like this:
DF3:
+--------------------+---------+
|key3 |Value |
+--------------------+---------+
|k:1 l:1 m:0 n:0 o:0 | 1 |
|k:0 l:0 m:1 n:1 o:0 | 2 |
|k:0 l:0 m:0 n:0 o:1 | 3 |
+--------------------+---------+
In other words, the output dataframe should have a column that is a string of all rows in DF2, and each element should be followed by a 1 or 0 indicating whether that element was present in the list in the column key1 of DF1.
I am not sure how to go about it. Is there a simple UDF I can write to accomplish what I want?