PySpark replace Map Key-Value with empty string for null keys

Question

I did A groupby on my column and for some reasons my dataframe looks like this

ID           col
1            [item1 -> 2, -> 3, item3 -> 4, -> 5]
2            [item2 -> 1, -> 7, item3 -> 2, -> 7]

I want to remove the KV that have no keys or null key

I want something like this

ID           col
1            [item1 -> 2, item3 -> 4, -> 5]
2            [item2 -> 1, item3 -> 2, -> 7]

My Approach

dsNew = ds.withColumn("col", expr("map_filter(col, (k, v) ->  k != '')"))

But the map_filter method is not available in PySpark

viggnah · Accepted Answer · 2022-07-30 05:38:19Z

1

map_filter() is available from version 3.1.0. However your column needs to be of map type, like this for example:

root
 |-- id: long (nullable = true)
 |-- data: map (nullable = true)
 |    |-- key: string
 |    |-- value: double (valueContainsNull = true)

Then you could do the following:

from pyspark.sql import SparkSession, functions as F

df = df.withColumn("filtered_data", F.map_filter("data", lambda k, v: k != ""))

answered Jul 30, 2022 at 5:38

viggnah

1,9171 gold badge6 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1