Aggregation of multiple values using scala/spark

Question

I am new with spark and scala. I want to sum up all the values present in the RDD. below is the example. RDD is key value pair and suppose after doing some join and transformation the output of RDD have 3 record as below, where A is key:

(A, List(1,1,1,1,1,1,1))
(A, List(1,1,1,1,1,1,1))
(A, List(1,1,1,1,1,1,1))

Now i want to sum up all values of each record with corresponding value in other records, so the output should come like

(A, List(3,3,3,3,3,3,3))

Can anyone please help me out on this. Is there any possible way to achieve this using scala?

Big Thanks in Advance

I tried to group them all then add the elements based on the position....but could not get the required result — yuvraj rajpurohit
– yuvraj rajpurohit, Commented Jul 2, 2016 at 12:42

zero323 · Accepted Answer · 2016-07-02 13:51:17Z

5

A naive approach is to reduceByKey:

rdd.reduceByKey(
  (xs, ys) => xs.zip(ys).map { case (x, y) => x + y }
)

but it is rather inefficient because it creates a new List on each merge.

You can improve on that by using for example aggregateByKey with mutable buffer:

rdd.aggregateByKey(Array.fill(7)(0)) // Mutable buffer 
  // For seqOp we'll mutate accumulator 
  (acc, xs) => {
    for {
      (x, i) <- xs.zipWithIndex
    } acc(i) += x
    acc
  },
  // For performance you could modify acc1 as above
  (acc1, acc2) => acc1.zip(acc2).map { case(x, y) => x + y }
).mapValues(_.toList)

It should be also possible to use DataFrames but by default recent versions schedule aggregations separately so without adjusting configuration it is probably not worth the effort.

edited Jul 2, 2016 at 13:51

answered Jul 2, 2016 at 12:31

zero323

331k108 gold badges981 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Aggregation of multiple values using scala/spark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related