I have a PairRDD in the form RDD[(String, Array[String])]. I want to flatten the values so that I have an RDD[(String, String)] where each of the elements in the Array[String] of the first RDD become a dedicated element in the 2nd RDD.
For instance, my first RDD has the following elements:
("a", Array("x", "y"))
("b", Array("y", "z"))
The result I want is this:
("a", "x")
("a", "y")
("b", "y")
("b", "z")
How can I do this? flatMapValues(f: Array[String] => TraverableOnce[String]) seems to be the right choice here, but what do I need to use as argument f?
rdd.flatMapValues(x => x)identityinstead ofx => x. The scala compiler is probably clever enough to realize that that'sidentitybut maybe not and then you create a new object.rdd.flatMap{ case (a,b) => b.map(a->_) }? DoesflatMapValuesdo anything different ?flatMapis not guaranteed to keep the partitioner of the original rdd (since there's no way to check that the keys will remain the same), whileflatMapValueswill. This is important when doing operations that require shuffling, as joins.