2

My data-frame looks like:

+-------------------+-------------+
|        Nationality|    continent|
+-------------------+-------------+
|       Turkmenistan|         Asia|
|         Azerbaijan|         Asia|
|             Canada|North America|
|         Luxembourg|       Europe|
|             Gambia|       Africa|

My output should look like this:

Map(Gibraltar -> Europe, Haiti -> North America)

So, I'm trying to convert the data-frame into

scala.collection.mutable.Map[String, String]()

I'm trying with following code:

    var encoder = Encoders.product[(String, String)]
    val countryToContinent = scala.collection.mutable.Map[String, String]()
    var mapped = nationalityDF.mapPartitions((it) => {
        ....
        ....
        countryToContinent.toIterator
    })(encoder).toDF("Nationality", "continent").as[(String, String)](encoder)

    val map = mapped.rdd.groupByKey.collect.toMap

But the result map has following output:

Map(Gibraltar -> CompactBuffer(Europe), Haiti -> CompactBuffer(North America))

How I can get the hash-map result without CompactBuffer?

1 Answer 1

2

Let's create some data:

val df = Seq(
("Turkmenistan", "Asia"), 
("Azerbaijan", "Asia"))
.toDF("Country", "Continent")

Try to map into a tuple first then collect into a map:

df.map{ r => (r.getString(0), r.getString(1))}.collect.toMap

Output:

scala.collection.immutable.Map[String,String] = Map(Turkmenistan -> Asia, Azerbaijan -> Asia)
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you. its working fine .. I just added encoder to the map:: newDF.map{ r => (r.getString(0), r.getString(1))}(encoder).collect.toMap
Hello @Hanan why did you add the encoder explicitly? In this case I don't think you need to set the encoder. I would avoid to mess with encoders unless it is necessary :) ie Spark can't recognize the encoding of my custom type! here you can find a nice explanation about the encoders github.com/vaquarkhan/Apache-Kafka-poc-and-notes/wiki/…
Actually it gave error before I added the encoder: Error:(58, 34) not enough arguments for method map: (implicit evidence$6: org.apache.spark.sql.Encoder[(String, String)])org.apache.spark.sql.Dataset[(String, String)]. Unspecified value parameter evidence$6. countryToContinent = newDF.map { r => (r.getString(0), r.getString(1)) }.collect.toMap
right I see you should completely remove the encoder then! You dont really need it. Just import spark.implicits._ shoud be enough
Thanks much! yes you're wright ... it works fine using spark.implicits._

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.