Convert dataframe to hash-map using Spark Scala

Question

My data-frame looks like:

+-------------------+-------------+
|        Nationality|    continent|
+-------------------+-------------+
|       Turkmenistan|         Asia|
|         Azerbaijan|         Asia|
|             Canada|North America|
|         Luxembourg|       Europe|
|             Gambia|       Africa|

My output should look like this:

Map(Gibraltar -> Europe, Haiti -> North America)

So, I'm trying to convert the data-frame into

scala.collection.mutable.Map[String, String]()

I'm trying with following code:

    var encoder = Encoders.product[(String, String)]
    val countryToContinent = scala.collection.mutable.Map[String, String]()
    var mapped = nationalityDF.mapPartitions((it) => {
        ....
        ....
        countryToContinent.toIterator
    })(encoder).toDF("Nationality", "continent").as[(String, String)](encoder)

    val map = mapped.rdd.groupByKey.collect.toMap

But the result map has following output:

Map(Gibraltar -> CompactBuffer(Europe), Haiti -> CompactBuffer(North America))

How I can get the hash-map result without CompactBuffer?

abiratsis · Accepted Answer · 2019-04-21 23:08:13Z

2

Let's create some data:

val df = Seq(
("Turkmenistan", "Asia"), 
("Azerbaijan", "Asia"))
.toDF("Country", "Continent")

Try to map into a tuple first then collect into a map:

df.map{ r => (r.getString(0), r.getString(1))}.collect.toMap

Output:

scala.collection.immutable.Map[String,String] = Map(Turkmenistan -> Asia, Azerbaijan -> Asia)

answered Apr 21, 2019 at 23:08

abiratsis

7,3414 gold badges31 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Hanan Atallah Over a year ago

Thank you. its working fine .. I just added encoder to the map:: newDF.map{ r => (r.getString(0), r.getString(1))}(encoder).collect.toMap

abiratsis Over a year ago

Hello @Hanan why did you add the encoder explicitly? In this case I don't think you need to set the encoder. I would avoid to mess with encoders unless it is necessary :) ie Spark can't recognize the encoding of my custom type! here you can find a nice explanation about the encoders github.com/vaquarkhan/Apache-Kafka-poc-and-notes/wiki/…

Hanan Atallah Over a year ago

Actually it gave error before I added the encoder: Error:(58, 34) not enough arguments for method map: (implicit evidence$6: org.apache.spark.sql.Encoder[(String, String)])org.apache.spark.sql.Dataset[(String, String)]. Unspecified value parameter evidence$6. countryToContinent = newDF.map { r => (r.getString(0), r.getString(1)) }.collect.toMap

abiratsis Over a year ago

right I see you should completely remove the encoder then! You dont really need it. Just import spark.implicits._ shoud be enough

Hanan Atallah Over a year ago

Thanks much! yes you're wright ... it works fine using spark.implicits._

Collectives™ on Stack Overflow

Convert dataframe to hash-map using Spark Scala

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related