1

I would like to convert the following for loop into a functional Scala method.

for (i <- 15 to 25){
  count_table_rdd = count_table_rdd.union(training_data.map(line => (i+"_"+line(i)+"_"+line(0), 1)).reduceByKey(_ + _))
}

I have tried look at the foreach method, but I do not want to transform every item, just 15 through 25.

2
  • 2
    Scala collections have a slice(from:Int, to:Int) method on them, so if you could slice and then foreach you could be all set Commented Apr 10, 2015 at 18:02
  • Do you really need the value of i in your actual use case? Or just line(i)? Commented Apr 10, 2015 at 21:13

3 Answers 3

3

You can fold.

val result = (count_table_rdd /: (15 to 25)){ (c, i) => c.union(...) }

If you see that you've got a set of data and you're pushing a value through it doing updates to that value, you should reach for a fold because that's exactly what it does.

Sign up to request clarification or add additional context in comments.

2 Comments

I thought the domino operator is frowned upon now?
@JustinPihony - Eh, Martin Odersky makes a good case that it very accurately visually represents what's going on. And foldLeft makes you switch the order of arguments in your head between the initial list/parameter and the closure.
1

You may use tailrec too but @rex's method is what you should be following. It will not compile, specify Type of your count_table_rdd and res accordingly

tailrec version :

@annotation.tailrec
  def f(start: Int = 15, end: Int = 25,res:List[Your_count_table_rdd_Type]=Nil): List[Your_count_table_rdd_Type] = {
    if (start > end) count_table_rdd
    else {
     val temp = res ++ training_data.map(line => (start + "_" + line(start) + "_" + line(0), 1)).reduceByKey(_ + _)
      f(start + 1, end,temp)
    }
  }

  f()

you can specify start and end too.

f(30,45)

1 Comment

That's not functional; you're mutating count_table_rdd. Try again?
1

Taking this from the Spark perspective, it could be better to do this by transforming the trainingDataRDD instead of looping to select given columns.

Something like:

trainingData.flatMap(line => (15 to 25).map(i => (i+"_"+line(i)+"_"+line(0), 1)))
        .reduceByKey(_ + _)

This will be more efficient that joining pieces of an RDD together using union.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.