Convert ListBuffer of Dataframes into one single Dataframe Spark Scala

Question

I have a ListBuffer of 30 DataFrames with the same fields and I want to 'append' them all at once. What is the best way and most efficient?

var result_df_list = new ListBuffer[DataFrame]()

I have seen that you can create a Sequence of DF like this:

val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)

But how can you achieve this with a ListBuffer?

for ListBuffer the same as for Seq

pasha701
– pasha701

2019-09-29 15:19:59 +00:00
Commented Sep 29, 2019 at 15:19 — pasha701
– pasha701, Commented Sep 29, 2019 at 15:19
a ListBuffer IS a Seq

Raphael Roth
– Raphael Roth

2019-09-29 19:55:22 +00:00
Commented Sep 29, 2019 at 19:55 — Raphael Roth
– Raphael Roth, Commented Sep 29, 2019 at 19:55

werner · Accepted Answer · 2019-09-29 17:16:34Z

2

The reduce method of ListBuffer works as expected.

Running

val unioned = result_df_list.reduce(_ union _)
unioned.explain()

results in a good looking execution plan:

== Physical Plan ==
Union
:- LocalTableScan [value#1]
:- LocalTableScan [value#5]
+- LocalTableScan [value#9]

answered Sep 29, 2019 at 17:16

werner

15k6 gold badges36 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Aleh Pranovich · Accepted Answer · 2019-09-29 17:39:37Z

2

You can also use reduce() with ListBuffer.

  import spark.implicits._

  var result_df_list = new ListBuffer[DataFrame]()

  val df1 = Seq("1").toDF("value")
  val df2 = Seq("2").toDF("value")
  val df3 = Seq("3").toDF("value")

  result_df_list += df1
  result_df_list += df2
  result_df_list += df3

  val df_united: DataFrame = result_df_list.reduce(_ unionByName _)

  df_united.show()

Result:

+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+

answered Sep 29, 2019 at 17:39

Aleh Pranovich

3611 silver badge7 bronze badges

Comments

Mukul · Accepted Answer · 2020-03-27 13:09:20Z

0

You can use MutableList And in mutable list toDF method can be used to convert the object into DataFrame or DataSet

answered Mar 27, 2020 at 13:09

Mukul

1

1 Comment

Andrew Nolan Over a year ago

Might help to post example usages of what you explained.

Hitesh · Accepted Answer · 2019-09-29 16:00:37Z

-1

You can try converting your list buffer to List by invoking toList method on List buffer and then you can use the reduce method.

answered Sep 29, 2019 at 16:00

Hitesh

4721 gold badge4 silver badges16 bronze badges

Collectives™ on Stack Overflow

Convert ListBuffer of Dataframes into one single Dataframe Spark Scala

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related