1

I have a ListBuffer of 30 DataFrames with the same fields and I want to 'append' them all at once. What is the best way and most efficient?

var result_df_list = new ListBuffer[DataFrame]()

I have seen that you can create a Sequence of DF like this:

val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)

But how can you achieve this with a ListBuffer?

2
  • for ListBuffer the same as for Seq Commented Sep 29, 2019 at 15:19
  • 1
    a ListBuffer IS a Seq Commented Sep 29, 2019 at 19:55

4 Answers 4

2

The reduce method of ListBuffer works as expected.

Running

val unioned = result_df_list.reduce(_ union _)
unioned.explain()

results in a good looking execution plan:

== Physical Plan ==
Union
:- LocalTableScan [value#1]
:- LocalTableScan [value#5]
+- LocalTableScan [value#9]
Sign up to request clarification or add additional context in comments.

Comments

2

You can also use reduce() with ListBuffer.

  import spark.implicits._

  var result_df_list = new ListBuffer[DataFrame]()

  val df1 = Seq("1").toDF("value")
  val df2 = Seq("2").toDF("value")
  val df3 = Seq("3").toDF("value")

  result_df_list += df1
  result_df_list += df2
  result_df_list += df3

  val df_united: DataFrame = result_df_list.reduce(_ unionByName _)

  df_united.show()

Result:

+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+

Comments

0

You can use MutableList And in mutable list toDF method can be used to convert the object into DataFrame or DataSet

1 Comment

Might help to post example usages of what you explained.
-1

You can try converting your list buffer to List by invoking toList method on List buffer and then you can use the reduce method.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.