2

I have this prbolem, I have an RDD[(String,String, List[String]), and I would like to "flatmap" it to obtain a RDD[(String,String, String)]:

e.g:

val x :RDD[(String,String,  List[String]) = 
RDD[(a,b, list[ "ra", "re", "ri"])]

I would like get:

val result: RDD[(String,String,String)] = 
RDD[(a, b, ra),(a, b, re),(a, b, ri)])]

2 Answers 2

7

Use flatMap:

val rdd = sc.parallelize(Seq(("a", "b", List("ra", "re", "ri"))))
// rdd: org.apache.spark.rdd.RDD[(String, String, List[String])] = ParallelCollectionRDD[7] at parallelize at <console>:28

rdd.flatMap{ case (x, y, z) => z.map((x, y, _)) }.collect
// res23: Array[(String, String, String)] = Array((a,b,ra), (a,b,re), (a,b,ri))
Sign up to request clarification or add additional context in comments.

Comments

0

This is an alternative way of doing it using flatMap again

val rdd  =  sparkContext.parallelize(Seq(("a", "b", List("ra", "re", "ri"))))
rdd.flatMap(array => array._3.map(list => (array._1, array._2, list))).foreach(println)

2 Comments

I do not agree, the type of result here would be RDD[List[(String, String, String)]] while the OP asks for RDD[(String, String, String)]
Yes you are right @GPI let me update my answer :) thanks for letting me know

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.