1

There are many examples on how to create empty dataframe/Dataset using Spark Scala/Python. But I would like to know how to create an empty dataframe/Dataset in Java Spark.

I have to create an empty dataframe with just one column with header as Column_1 and type String.

2 Answers 2

4

Another approach to create an empty dataset with specified schema in Java is given in this answer.

Once you created the schema of type StructType, use

Dataset<Row> emptyDataSet = spark.createDataFrame(new ArrayList<>(), schema);
Sign up to request clarification or add additional context in comments.

Comments

3

Alternative-1 Create empty dataframe with the user defined schema

// alternative - 1
        StructType s = new StructType()
                .add(new StructField("Column_1", DataTypes.StringType, true, Metadata.empty()));
        Dataset<Row> csv = spark.read().schema(s).csv(spark.emptyDataset(Encoders.STRING()));
        csv.show(false);
        csv.printSchema();
        /**
         * +--------+
         * |Column_1|
         * +--------+
         * +--------+
         *
         * root
         *  |-- Column_1: string (nullable = true)
         */

Alternative-2 create dataframe with null value and user defined schema

 Dataset<Row> df4 = spark.sql("select cast(null  as string) Column_1");
        df4.show(false);
        df4.printSchema();
        /**
         * +--------+
         * |Column_1|
         * +--------+
         * |null    |
         * +--------+
         *
         * root
         *  |-- Column_1: string (nullable = true)
         */

Alternative-3

 ClassTag<Row> rowTag = scala.reflect.ClassTag$.MODULE$.apply(Row.class);
        Dataset<Row> df5 = spark.createDataFrame(spark.sparkContext().emptyRDD(rowTag),
                new StructType()
                        .add(new StructField("Column_1", DataTypes.StringType, true, Metadata.empty())));
        df5.show(false);
        df5.printSchema();
        /**
         * +--------+
         * |Column_1|
         * +--------+
         * +--------+
         *
         * root
         *  |-- Column_1: string (nullable = true)
         */

spark.emptyDataframe to create dataframe without any column and value

 Dataset<Row> rowDataset = spark.emptyDataFrame();
        rowDataset.show(false);
        rowDataset.printSchema();
        /**
         * ++
         * ||
         * ++
         * ++
         *
         * root
         */

3 Comments

Good one Someshwar
added one more approach
I tried alternative - 3 but it gives error of sparkexception job aborted due to stage failure

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.