I have a spark dataframe, and I wish to check whether each string in a particular column exists in a pre-defined a column of another dataframe. I have found a same problem in Spark (scala) dataframes - Check whether strings in column contain any items from a set
but I want to Check whether strings in column exists in a column of another dataframe not a List or a set follow that question. Who can help me! I don't know convert a column to a set or a list and i don't know "exists" method in dataframe.
My data is similar to this
df1:
+---+-----------------+
| id| url |
+---+-----------------+
| 1|google.com |
| 2|facebook.com |
| 3|github.com |
| 4|stackoverflow.com|
+---+-----------------+
df2:
+-----+------------+
| id | urldetail |
+-----+------------+
| 11 |google.com |
| 12 |yahoo.com |
| 13 |facebook.com|
| 14 |twitter.com |
| 15 |youtube.com |
+-----+------------+
Now, i am trying to create a third column with the results of a comparison to see if the strings in the $"urldetail" column if exists in $"url"
+---+------------+-------------+
| id| urldetail | check |
+---+------------+-------------+
| 11|google.com | 1 |
| 12|yahoo.com | 0 |
| 13|facebook.com| 1 |
| 14|twitter.com | 0 |
| 15|youtube.com | 0 |
+---+------------+-------------+
I want to use UDF but i don't know how to check whether string exists in a column of a dataframe! Please help me!
inputsandoutputs