3

Spark 1.6.2 and Scala 2.10 here.

I want to filter the spark dataframe column with an array of strings.

val df1 = sc.parallelize(Seq((1, "L-00417"), (3, "L-00645"), (4, "L-99999"),(5, "L-00623"))).toDF("c1","c2")
+---+-------+
| c1|     c2|
+---+-------+
|  1|L-00417|
|  3|L-00645|
|  4|L-99999|
|  5|L-00623|
+---+-------+

val df2 = sc.parallelize(Seq((1, "L-1"), (3, "L-2"), (4, "L-3"),(5, "L-00623"))).toDF("c3","c4")

+---+-------+
| c3|     c4|
+---+-------+
|  1|    L-1|
|  3|    L-2|
|  4|    L-3|
|  5|L-00623|
+---+-------+

val c2List = df1.select("c2").as[String].collect()

df2.filter(not($"c4").contains(c2List)).show()`

I am getting below error.

Unsupported literal type class [Ljava.lang.String; [Ljava.lang.String;@5ce1739c

Can anyone please help to fix this?

1 Answer 1

3

First, contains isn't suitable because you're looking for the opposite relationship - you want to check if c2List contains c4's value, and not the other way around.

You can use isin for that - which uses "repeated argument" (similar to Java's "varargs") of the values to match, so you'd want to "expand" c2List into a repeated argument, which can be done using the : _* operator:

df2.filter(not($"c4".isin(c2List: _*)))

Alternatively, with Spark 1.6 you can use an "left anti join", to join the two dataframes and get only values in df2 that did NOT match values in df1:

df2.join(df1, $"c2" === $"c4", "leftanti")

Unlike the previous, this option is not limited to the case where df1 is small enough to be collected.

Lastly, if you're using earlier Spark version, you can immitate leftanti using a left join and a filter:

df2.join(df1, $"c2" === $"c4", "left").filter($"c2".isNull).select("c3", "c4")
Sign up to request clarification or add additional context in comments.

3 Comments

leftanti i think its not in Spark 1.6.2 version
df2.filter(not($"c4".isin(c2List: _*))) This worked for small set. Let me check for 1500 values set which I have in c2List actually.
Hello Tzach, Thanks. First approach worked. df2.filter(not($"c4".isin(c2List: _*)))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.