1

Problem, please give any solutions in Java(not scala or python)

I have a DataFrame with the following data

colA, colB
23,44
24,64

What i want is a dataframe like this

colA, colB, colC
23,44, result of myFunction(23,24)
24,64, result of myFunction(23,24)

Basically i would like to add a column to the dataframe in java, where the value of the new column is found by putting the values of colA and colB through a complex function which returns a string.

Here is what i've tried, but the parameter to complexFunction only seems to be the name 'colA', rather than the value in colA.

myDataFrame.withColumn("ststs", (complexFunction(myDataFrame.col("colA")))).show();
1

1 Answer 1

0

As suggested in the comments, you should use a User Defined Function. Let's suppose that you have a myFunction method which does the complex processing :

val myFunction : (Int, Int) => String = (colA, colB) => {...}

Then All you need to do is to transform your function into a udf and apply it on the columns A and B :

import org.apache.spark.sql.functions.{udf, col}

val myFunctionUdf = udf(myFunction)
myDataFrame.withColumn("colC", myFunctionUdf(col("colA"), col("colB")))

I hope it helps

Sign up to request clarification or add additional context in comments.

2 Comments

this doesn't seem to work in java - it doesn't let me take on more than one variable
This appears to be written in Scala, not Java as specified in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.