1

If you want to select the first column of a dataframe this can be done:

df.select(df.columns(0))

df.columns(0) returns a string, so by giving the name of the column, the select is able to get the column correctly.

Now, suppose I want to select the first 3 columns of the dataset, this is what I would intuitively do:

df.select(df.columns.split(0,3):_*)

The _* operator would pass the array of strings as a varag to my understanding, and it would be the same as passing (df.column(1), df.column(2), df.column(3)) to the select statement. However this doesn't work and it is necessary to do this:

import org.apache.spark.sql.functions.col
df.select(sf.columns.split(0,3).map(i => col(i)):_*))

What is going on?

1 Answer 1

1

I think in the question you meant slice instead of split.

And as for your question, df.columns.slice(0,3):_* is meant to be passed to functions with *-parameters (varargs), i.e. if you call select(columns:_*) then there must be a function defined with varargs, e.g. def select(cols: String*).

But there can only be one such function defined - no overloading here is possible. Example on why it's not possible to define two different functions with same vararg-parameter declaration:

def select(cols: String*): String = "string"
select() // returns "string"
def select(cols: Column*): Int = 3
select() // now returns 3

And in Spark, that one function is defined not for Strings but for Columns:

def select(cols: Column*)

For Strings, the method is declared like this:

def select(col: String, cols: String*)

I suggest you to stick to Columns, like you do now, but with some syntax sugar:

df.select(df.columns.slice(0,3).map(col):_*))

Or if there's a need to pass column names as Strings, then you can use selectExpr:

df.selectExpr(df.columns.slice(0,3):_*)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.