I have the following dataframe:
+---+---------+
| ID| Title|
+---+---------+
| 1|[2, test]|
| 3| [4,]|
+---+---------+
created using the code below
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql.functions import col, expr
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(1, [2, 'test']), (3, [4, None])]
schema = (StructType([
StructField("ID",IntegerType(),False),
StructField("Title",StructType([
StructField("TitleID",IntegerType(),False),
StructField("Type",StringType(),True),
]),False)
]))
df = spark.createDataFrame(data, schema)
Now I'm trying to replace the null title types with a default value. I have tried this using fillna but it doesn't have any effect:
default_type = 'type one'
df = df.fillna({'Title.Type':default_type})
I have also tried using a expr
df = df.withColumn('Title', expr('struct(Title.TitleID, Title.Type if Title.Type.isNotNull() else default_type'))
but now I get a ParseException:
ParseException:
extraneous input 'Title' expecting {')', ','}(line 1, pos 36)
== SQL ==
struct(Title.TitleID, Title.Type if Title.Type.isNotNull() else default_type
------------------------------------^^^
What am I doing wrong here?