2

I have the following dataframe:

+---+---------+
| ID|    Title|
+---+---------+
|  1|[2, test]|
|  3|     [4,]|
+---+---------+

created using the code below

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql.functions import col, expr
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

data = [(1, [2, 'test']), (3, [4, None])]

schema = (StructType([ 
    StructField("ID",IntegerType(),False),   
    StructField("Title",StructType([
      StructField("TitleID",IntegerType(),False),
      StructField("Type",StringType(),True),
    ]),False) 
  ]))

df = spark.createDataFrame(data, schema)

Now I'm trying to replace the null title types with a default value. I have tried this using fillna but it doesn't have any effect:

default_type = 'type one'
df = df.fillna({'Title.Type':default_type})

I have also tried using a expr

df = df.withColumn('Title', expr('struct(Title.TitleID, Title.Type if Title.Type.isNotNull() else default_type'))

but now I get a ParseException:

ParseException: 
extraneous input 'Title' expecting {')', ','}(line 1, pos 36)

== SQL ==
struct(Title.TitleID, Title.Type if Title.Type.isNotNull() else default_type
------------------------------------^^^

What am I doing wrong here?

1 Answer 1

1

You're confusing Spark SQL expr with Python expr:

import pyspark.sql.functions as F

df = df.withColumn(
    'Title', 
    F.expr(f"struct(Title.TitleID as TitleID, case when Title.Type is not null then Title.Type else '{default_type}' end as Type)")
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.