0

Is this possible in spark-scala? I am using spark 2.2

val func="""withColumn("seq", lit("this is seq"))
           .withColumn("id", lit("this is id"))
           .withColumn("type", lit("this is type"))"""

Then use the above variable on top of a dataframe (df) like this

val df2=df.$func

Reason I am saving those functions to a variable is that I want to apply functions dynamically based on conditions. Sometime I may want 1 withColumn and sometimes I may want multiple withColumn functions.

Appreciate any help. Thanks!

1 Answer 1

5

If I understood correctly, then you can do this using foldLeft

Let's suppose you have a dataframe df as

val df: DataFrame = Seq(("123"), ("123"), ("223"), ("223")).toDF()

You can create a list of column names and the operation/function that you call as

val list = List(
  ("seq", lit("this is seq")),
  ("id", lit("this is id")),
  ("type" , lit("thisis type"))
)

Now you can use foldLeft to use this list as

list.foldLeft(df){(tempDF, listValue) =>
  tempDF.withColumn(listValue._1, listValue._2)
}

Better solution is to create a select statement from list of above values and columns from dataframe as below

val columns = df.columns.map(col) ++ list.map(r => r._2 as r._1)

Final Result:

+-----+-----------+----------+-----------+
|value|seq        |id        |type       |
+-----+-----------+----------+-----------+
|123  |this is seq|this is id|thisis type|
|123  |this is seq|this is id|thisis type|
|223  |this is seq|this is id|thisis type|
|223  |this is seq|this is id|thisis type|
+-----+-----------+----------+-----------+

Hope this helps!

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks Shankar!! Is there a way to dynamically execute a statement in spark? I mean if I prepare the statement and store in a variable, can i execute it?
Like what kind of statement you want to store on a variable? If its a function that you want to apply, you surely can use it
like, I have a mix of operations like "withColumn" and "select" operations to be performed on a df and they may vary based on my input. Any suggestion how i can approach to this?
I don't think that can be achieved because they are functions and they can't be called dynamically.
@koiralo could you please advice what is wrong here in reduce function stackoverflow.com/questions/63843599/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.