9

In Java, I use RowFactory.create() to create a Row:

Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));

where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to use a List or an Array to create the "row". In Scala, I can use Row.fromSeq() to create a Row from a List or an Array, but how can I achieve that in Java?

4 Answers 4

16

We often need to create Datasets or Dataframes in real world applications. Here is an example of how to create Rows and Dataset in a Java application:

// initialize first SQLContext
SQLContext sqlContext = ... 
StructType schemata = DataTypes.createStructType(
        new StructField[]{
                createStructField("NAME", StringType, false),
                createStructField("STRING_VALUE", StringType, false),
                createStructField("NUM_VALUE", IntegerType, false),
        });
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
+-----+------------+---------+
| NAME|STRING_VALUE|NUM_VALUE|
+-----+------------+---------+
|name1|      value1|        1|
|name2|      value2|        2|
+-----+------------+---------+
Sign up to request clarification or add additional context in comments.

2 Comments

@thank you , in scala we will do sc.paralallize(List((x,y),(a,b))).toDF("col1","col2"), it is so simple , why these Row , JavaRDD and etc ? any simple way like that ?
You are saying you need to create Dataset in real world applications and making a hard definition of the all variables. Does not make any sense. In the real world everything has to be parameterizable and beforehand you do not know the values.
12

I am not sure if I get your question correctly but you can use the RowFactory to create Row from ArrayList in java.

List<MyData> mlist = new ArrayList<MyData>();
    mlist.add(d1);
    mlist.add(d2);

Row row = RowFactory.create(mlist.toArray());   

4 Comments

hi, when I use your method, I found spark regard mlist as a whole object: Row row = RowFactory.create(mlist); System.out.println("row number:" + row.length()); System.out.println("mlist number:" + mlist.size()); I got: row number:1 mlist number:2
Yes but Row will have both records.You can try printing System.out.println("row number:" + row.toSeq());
hi, thanks so much! And you can try this: Object[] rowArray = {obj1, obj2, ....} Row row = RowFactory.create(rowArray); System.out.println("row number:" + row.length()); You will get - row number:6
Thanks. I updated my answer. I checked the source code for RowFactory and GenericRow class.-"An internal row implementation that uses an array of objects as the underlying storage."
0

//Create a a list of DTO

List<MyDTO> dtoList = Arrays.asList(.....));

//Create a Dataset of DTO

Dataset<MyDTO> dtoSet = sparkSession.createDataset(dtoList,
                Encoders.bean(MyDTO.class));

//If you need dataset of Row

Dataset<Row> rowSet= dtoSet .select("col1","col2","col3");

Comments

-1

For simple list values you can use Encoders:

 List<Row> rows = ImmutableList.of(RowFactory.create(new Timestamp(currentTime)));
 Dataset<Row> input = sparkSession.createDataFrame(rows, Encoders.TIMESTAMP().schema());

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.