0

is there a way to add an argument to a spark UDF in addtion to the column. I know you can use currying in Scala, but it doesn't work as I like it to.

Lets take this function as an example:

def containsWord(word: String, words: Seq[String]): Boolean = {
  for (w <- words) if (word.contains(w)) return true
  false
}

The word string is the parameter I want to get out of the column. Without the second argument I could create the UDF with the udf function and give it the column as parameter. How can I add the String sequence in the UDF call?

Any help would be appreciated.

0

1 Answer 1

1

You don't really need currying here (although the idea is similar). You can just define a function that takes your sequence as a parameter and returns a udf:

def containsWord(words : Seq[String]) = udf((word : String) => words.contains(word))

And then use it like this:

sc.parallelize(Seq("a", "b", "c", "d", "e"))
    .toDF("A")
    .withColumn("B", containsWord(Seq("a", "b", "d"))($"A"))
    .show

And it gives you this:

+---+-----+
|  A|    B|
+---+-----+
|  a| true|
|  b| true|
|  c|false|
|  d| true|
|  e|false|
+---+-----+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.