0

When I run this code, I get empty collection error in some cases.

    val result = df
                  .filter(col("channel_pk") === "abc")
                  .groupBy("member_PK")
                  .agg(sum(col("price") * col("quantityOrdered")) as "totalSum")
                  .select("totalSum")
                  .rdd.map(_ (0).asInstanceOf[Double]).reduce(_ + _)

The error happens at this line:

.rdd.map(_ (0).asInstanceOf[Double]).reduce(_ + _)

When collection is empty, I want result to be equal to 0. How can I do it?

6
  • before you convert to rdd, did you check if the dataframe has rows or not? I guess its empty Commented Apr 26, 2018 at 12:58
  • @RameshMaharjan: yes, my question is how can I check that it's empty and return 0 if it's empty? Commented Apr 26, 2018 at 13:01
  • @RameshMaharjan: Please check my question (the last line). I somehow deleted it from the question. Thanks. Commented Apr 26, 2018 at 13:03
  • @RameshMaharjan: So, I don't understand how can I get 0 from assert example of Noam. Commented Apr 26, 2018 at 13:03
  • @RameshMaharjan: if (assert(df.take(1).isEmpty)) result = 0 else result ... ? Commented Apr 26, 2018 at 13:06

2 Answers 2

1

When collection is empty, I want result to be equal to 0. How can I do it?

Before you do aggregation, just check if the dataframe has some rows or not

val result = if(df.take(1).isEmpty) 0 else df
  .filter(col("channel_pk") === "abc")
  .groupBy("member_PK")
  .agg(sum(col("price") * col("quantityOrdered")) as "totalSum")
  .select("totalSum")
  .rdd.map(_(0).asInstanceOf[Double]).reduce(_ + _)

or you can use count too

val result = if(df.count() == 0) 0 else df
  .filter(col("channel_pk") === "abc")
  .groupBy("member_PK")
  .agg(sum(col("price") * col("quantityOrdered")) as "totalSum")
  .select("totalSum")
  .rdd.map(_(0).asInstanceOf[Double]).reduce(_ + _)
Sign up to request clarification or add additional context in comments.

Comments

1

The error appears only at that line because this is the first time you make some action. before that spark doesn't execute anything (laziness). your df is just empty. You can verify it by adding before: assert(!df.take(1).isEmpty)

2 Comments

I noticed that I somehow deleted the last line of my message. There was a question When collection is empty, I want result to be equal to 0. How can I do it?
Then replace the assert with if statement that returns 0 when empty

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.