Pyspark add empty literal map of type string

Question

Similar to this question I want to add a column to my pyspark DataFrame containing nothing but an empty map. If I use the suggested answer from that question, however, the type of the map is <null,null>, unlike in the answer posted there.

from pyspark.sql.functions import create_map
spark.range(1).withColumn("test", create_map()).printSchema()

root
 |-- test: map(nullable = false)
 |    |-- key: null
 |    |-- value: null (valueContainsNull = false)

I need an empty <string,string> map. I can do it in Scala like so:

import org.apache.spark.sql.functions.typedLit
spark.range(1).withColumn("test", typedLit(Map[String, String]())).printSchema()

root
 |-- test: map(nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

How can I do it in pyspark? I am using Spark 3.01 with underlying Scala 2.12 on Databricks Runtime 7.3 LTS. I need the <string,string> map because otherwise I can't save my Dataframe to parquet:

AnalysisException: Parquet data source does not support map<null,null> data type.;

Nithish · Accepted Answer · 2021-12-09 15:16:47Z

6

You can cast the map to the appropriate type creating the map using create_map.


from pyspark.sql.functions import create_map
spark.range(1).withColumn("test", create_map().cast("map<string,string>")).printSchema()

root
 |-- id: long (nullable = false)
 |-- test: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

answered Dec 9, 2021 at 15:16

Nithish

3,2472 gold badges11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark add empty literal map of type string

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related