42
val columnName=Seq("col1","col2",....."coln");

Is there a way to do dataframe.select operation to get dataframe containing only the column names specified . I know I can do dataframe.select("col1","col2"...) but the columnNameis generated at runtime. I could do dataframe.select() repeatedly for each column name in a loop.Will it have any performance overheads?. Is there any other simpler way to accomplish this?

2
  • 1
    duplicate? stackoverflow.com/questions/34938770/… Commented Oct 30, 2017 at 21:01
  • 3
    @stuart That is a duplicate of this question. See the timeline. Commented Oct 31, 2017 at 9:04

4 Answers 4

85
val columnNames = Seq("col1","col2",....."coln")

// using the string column names:
val result = dataframe.select(columnNames.head, columnNames.tail: _*)

// or, equivalently, using Column objects:
val result = dataframe.select(columnNames.map(c => col(c)): _*)
Sign up to request clarification or add additional context in comments.

4 Comments

tail returns the sequence excluding the first item (head); : _* transforms a collection into a vararg argument - used when calling a method expecting a vararg, like select does: def select(col: String, cols: String*)
It's called, repeated parameters, you can check more about it here - chapter 4 section 2.
@V.Samma that won't compile, check the signatures of select - it's either select(col: String, cols: String*): DataFrame for Strings, or select(cols: Column*): DataFrame for Columns, there's no select(cols: String*): DataFrame. See spark.apache.org/docs/latest/api/scala/…
Is there a way to add alias for other columns like this? dataframe.select(columnNames.head, columnNames.tail: _*, col("abc").as("def")) ?
8

Since dataFrame.select() expects a sequence of columns and we have a sequence of strings, we need to convert our sequence to a List of cols and convert that list to the sequence. columnName.map(name => col(name)): _* gives a sequence of columns from a sequence of strings, and this can be passed as a parameter to select():

  val columnName = Seq("col1", "col2")
  val DFFiltered = DF.select(columnName.map(name => col(name)): _*)

2 Comments

Please add some context and explanation to this answer.
@UserszrKs i am using spark 2.3.1 version , when i use the above it is giving an error .."type mismatch : found: org.apache.spark.sql.Column , required :Seq[?] , What is wrong here?
0

You can use (List(F.col("*")) ++ updatedColumns): _* in select.

val updatedColumns: List[Column] = inputColumnNames.map(x => (F.col(x) * F.col("is_t90d")).alias(x))

val outputSDF = {
    inputSDF
    .withColumn("is_t90d", F.col("original_date").between(firstAllowedDate, lastAllowedDate).cast(IntegerType))
    .select( // select existing and additional columns
        (List(F.col("*")) ++ updatedColumns): _*
    )
}

Comments

-1

Alternatively, you can also write like this

val columnName = Seq("col1", "col2")
  val DFFiltered = DF.select(columnName.map(DF(_): _*)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.