0

I have a CSV file with one of the fields with a map as mentioned below "Map(12345 -> 45678, 23465 -> 9876)"

When I am trying to load the csv into dataframe, it is considering it as string. So, I have written a UDF to convert the string to map as below

 val convertToMap = udf((pMap: String) => { 
   val mpp = pMap
   // "Map(12345 -> 45678, 23465 -> 9876)" 
   val stg = mpp.substr(4, mpp.length() -1) val stg1=stg.split(regex=",").toList       
   val mp=stg1.map(_.split(regex=" ").toList) 
   val mp1 = mp.map(mp =>
   (mp(0), mp(2))).toMap 
   } )

Now I need help in applying the UDF to the column where it is being taken as string and return the DF with the converted column.

1 Answer 1

1

You are pretty close, but it looks like your UDF has some mix of scala and python, and the parsing logic needs a little work. There may be a better way to parse a map literal string, but this works with the provided example:

val convertToMap = udf { (pMap: String) =>
  val stg = pMap.substring(4, pMap.length() - 1)
  val stg1 = stg.split(",").toList.map(_.trim)
  val mp = stg1.map(_.split(" ").toList) 
  mp.map(mp =>(mp(0), mp(2))).toMap 
}

val df = spark.createDataset(Seq("Map(12345 -> 45678, 23465 -> 9876)")).toDF("strMap")

With the corrected UDF, you simply invoke it with a .select() or a .withColumn():

df.select(convertToMap($"strMap").as("map")).show(false)

Which gives:

+----------------------------------+
|map                               |
+----------------------------------+
|Map(12345 -> 45678, 23465 -> 9876)|
+----------------------------------+

With the schema:

root
 |-- map: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, Travis for the quick response. I see that you have taken the string value directly as a sequence and created a dataset. However, in my case, it should be considering each value from that column (say DealMap is my column in the x dataframe) and when I am trying to take as below, it is throwing me error. val df1 = spark.createDataset(Seq(actual_df.select("DealMap"))).toDF("strMap") Please suggest.
I only have the createDataset call to show a working example. Since you already have a dataframe, then skip straight to the .select() step: val df1 = actual_df.select(convertToMap($"DealMap"))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.