1

I'm trying to select List of columns from the DataFrame using Java API.

Sample Java Code:

List<String> colList = Arrays.asList(new String[] { "column1", "column2", "column3" });
df.selectExpr((String[])colList.toArray()).show();

In Java API, I have to use selectExpr instead of select. Is there any other way of selecting list of columns using Java API.

But in Scala, I can do something like below.

Sample Scala Code:

val colList = List("column1", "column2", "column3")
df.select(colList.head, colList.tail: _*).show

1 Answer 1

13

You can use array of String:

String[] colList =  { "column1", "column2", "column3" };
String first = colList[0];
String[] rest =  Arrays.copyOfRange(colList, 1, colList.length);

logData.select(first, rest);

or array of Column:

import static org.apache.spark.sql.functions.col;
import org.apache.spark.sql.Column;

Column[] colList =  { col("column1"), col("column2"), col("column3") };
logData.select(colList);
Sign up to request clarification or add additional context in comments.

3 Comments

first approach is not array of string, you are programmatically building first and rest and also second approach is again something we need to build it from the list of column names as String.
Thanks a lot! This is a lovely and straightforward way. NOT.
@Alper, nice. My requirement is, columns will change, say sometimes I get col1, col2 and sometimes I get col1,col2,col3. How to dynamically do this in second approach. because I need to add column alias also from params. appreciate your inputs on this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.