0

I have a PairRDD in the form RDD[(String, Array[String])]. I want to flatten the values so that I have an RDD[(String, String)] where each of the elements in the Array[String] of the first RDD become a dedicated element in the 2nd RDD.

For instance, my first RDD has the following elements:

("a", Array("x", "y"))
("b", Array("y", "z"))

The result I want is this:

("a", "x")
("a", "y")
("b", "y")
("b", "z")

How can I do this? flatMapValues(f: Array[String] => TraverableOnce[String]) seems to be the right choice here, but what do I need to use as argument f?

5
  • Just do rdd.flatMapValues(x => x) Commented Sep 3, 2015 at 18:32
  • @kaktusito Right thanks; I've updated the question because I was actually looking for the argument to pass into flatMapValues(). You've made that clean. Commented Sep 3, 2015 at 18:40
  • @Carsten I would use identity instead of x => x. The scala compiler is probably clever enough to realize that that's identity but maybe not and then you create a new object. Commented Sep 3, 2015 at 18:41
  • 1
    Is there any difference using this instead: rdd.flatMap{ case (a,b) => b.map(a->_) } ? Does flatMapValues do anything different ? Commented Sep 4, 2015 at 7:47
  • @tuxdna There's a performance reason, I believe. flatMap is not guaranteed to keep the partitioner of the original rdd (since there's no way to check that the keys will remain the same), while flatMapValues will. This is important when doing operations that require shuffling, as joins. Commented Sep 4, 2015 at 11:09

1 Answer 1

4

To achieve the desired result, do:

val rdd1: RDD[(Any, Array[Any])] = ...
val rddFlat: RDD[(Any, Any)] = rdd1.flatMapValues(identity[Array[Any]])

The result looks like the one asked for in the question.

Sign up to request clarification or add additional context in comments.

1 Comment

protip: It should be a Wiki answer instead since you simply gathered the comments.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.