0

I'm using spark-sql-2.4.1v with java8.

I have dynamic list of columns is are passed into my function.

i.e.

List<String> cols = Arrays.asList("col_1","col_2","col_3","col_4");
Dataset<Row> df = //which has above columns plus "id" ,"name" plus many other columns;

Need to select cols + "id" + "name"

I am doing as below

Dataset<Row> res_df = df.select("id", "name", cols.stream().toArray( String[]::new)); 

this is giving compilation error. so how to handle this use-case.

Tried :

When I do something like below :

List<String> cols = new ArrayList<>(Arrays.asList("col_1","col_2","col_3","col_4"));
cols.add("id");
cols.add("name");

Giving error

Exception in thread "main" java.lang.UnsupportedOperationException
    at java.util.AbstractList.add(AbstractList.java:148)
    at java.util.AbstractList.add(AbstractList.java:108)
1
  • 1
    You get UnsupportedOperationException because actual type of List you're using is Arrays.ArrayList (returned from Arrays.asList) and not util.ArrayList. Commented Mar 31, 2020 at 8:07

2 Answers 2

1

You could create array of Columns and pass it to the select statement.

import org.apache.spark.sql.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

List<String> cols = new ArrayList<>(Arrays.asList("col_1","col_2","col_3","col_4"));
cols.add("id");
cols.add("name");
Column[] cols2 = cols.stream()
        .map(s->new Column(s)).collect(Collectors.toList())
        .toArray(new Column[0]);

settingsDataset.select(cols2).show();
Sign up to request clarification or add additional context in comments.

1 Comment

try to change List to ArrayList<String> cols and add these imports import java.util.ArrayList; import java.util.Arrays; import java.util.List;
1

You have a bunch of ways to achieve this, relying on different select method signatures.

One of the possible solutions, with the assumption cols List is immutable and is not controlled by your code:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import scala.collection.JavaConverters;

public class ATest {
    public static void main(String[] args) {
        SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark SQL basic example")
                .master("local[2]")
                .getOrCreate();

        List<String> cols = Arrays.asList("col_1", "col_2");

        Dataset<Row> df = spark.sql("select 42 as ID, 'John' as NAME, 1 as col_1, 2 as col_2, 3 as col_3, 4 as col4");
        df.show();

        ArrayList<String> newCols = new ArrayList<>();
        newCols.add("NAME");
        newCols.addAll(cols);
        df.select("ID", JavaConverters.asScalaIteratorConverter(newCols.iterator()).asScala().toSeq())
                .show();
    }
}

1 Comment

@BdEngineer I've updated the post with a full working example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.