2

I am new in Spark programming. I am trying to extract values from RDD as I got the below output from RDD

(CBI10006,(Some(Himanshu Vasani),None))
(CBI10004,(Some(Sonam Petro),Some(8500)))
(CBI10003,(None,Some(3000)))

And I want to extract above value to below one

(CBI10006,Himanshu Vasani,'')
(CBI10004,Sonam Petro,8500)
(CBI10003,'',3000)

And I have tried FlatMap approch as below

joined.flatMap{case(f1,f2) => (f1,(f2._1,f2._2))} but getting a below error

type mismatch;
 found   : (String, (Option[String], Option[String]))
 required: TraversableOnce[?]
    **joined.flatMap{case(f1,f2) => (f1,(f2._1,f2._2))}**
4
  • 1
    In your case a map would work, unless you want a RDD[List[String]] as a result Commented Jan 4, 2022 at 10:59
  • I am pretty sure you can just call values on it. Commented Jan 4, 2022 at 11:47
  • @LuisMiguelMejíaSuárez using flatMap? Commented Jan 5, 2022 at 2:00
  • @DhrumilShah no, I mean that you could just call values on joined to get the answer you want. Commented Jan 5, 2022 at 3:15

1 Answer 1

2

Using map():

val data = Seq(("CBI10006", (Some("Himanshu Vasani"), None)), ("CBI10004", (Some("Sonam Petro"), Some(8500))),
  ("CBI10003", (None, Some(3000))))
    
spark.sparkContext
  .parallelize(data)
  .map { case (x, y) => (x, y._1.getOrElse(""), y._2.getOrElse("")) }
  .foreach(println)

// output: 
// (CBI10006,Himanshu Vasani,)
// (CBI10004,Sonam Petro,8500)
// (CBI10003,,3000)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Gabib it's working....but just for understanding could you suggest me with flatMap option too?
@DhrumilShah there is no point in using flatMap for a one to one mapping. Why do you want to use that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.