0

I am new to spark-scala development. I am trying to create a map values in spark using scala but getting type mismatch error.

scala> val nums = sc.parallelize(Map("red" -> "#FF0000","azure" -> "#F0FFFF","peru" -> "#CD853F"))
<console>:21: error: type mismatch;
 found   : scala.collection.immutable.Map[String,String]
 required: Seq[?]
Error occurred in an application involving default arguments.
       val nums = sc.parallelize(Map("red" -> "#FF0000","azure" -> "#F0FFFF","peru" -> "#CD853F"))

How should I do this?

1
  • 1
    Looks like parallelize function expect Seq but not Map. If you still need a map, you can cast it to seq : yourMap.toSeq. _* - try it. Commented Nov 7, 2015 at 21:36

1 Answer 1

4

SparkContext.parallelize transforms from Seq[T] to RDD[T]. If you want to create RDD[(String, String)] where each element is an individual key-value pair from the original Map use:

import org.apache.spark.rdd.RDD

val m = Map("red" -> "#FF0000","azure" -> "#F0FFFF","peru" -> "#CD853F")
val rdd: RDD[(String, String)] = sc.parallelize(m.toSeq)

If you want RDD[Map[String,String]] (not that it makes any sense with a single element) use:

val rdd: RDD[Map[String,String]] = sc.parallelize(Seq(m))
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks to you man will have soon the whole documentation of Spark on SO :)
I almost choked on my coffee :)
Thanks zaro322. I was wondering how can i access keys from map? i.e. i tried rdd.keys() but got below error error: org.apache.spark.rdd.RDD[String] does not take parameters rdd.keys() Any suggestion..
rdd.keys - if method in Scala doesn't take parameters it should be called without parentheses.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.