If you want to select the first column of a dataframe this can be done:
df.select(df.columns(0))
df.columns(0) returns a string, so by giving the name of the column, the select is able to get the column correctly.
Now, suppose I want to select the first 3 columns of the dataset, this is what I would intuitively do:
df.select(df.columns.split(0,3):_*)
The _* operator would pass the array of strings as a varag to my understanding, and it would be the same as passing (df.column(1), df.column(2), df.column(3)) to the select statement. However this doesn't work and it is necessary to do this:
import org.apache.spark.sql.functions.col
df.select(sf.columns.split(0,3).map(i => col(i)):_*))
What is going on?