I have two set of queries with multiple case statements. I need to achieve the same logic in pyspark. I tried but I'm facing some difficulties with multiple when. Any help would be appreciatable.
FIRST QUERY
case
when appointment_date is null
then 0
when resolution_desc in (
'CSTXCL - OK BY PHONE'
)
or resolution_des ilike '%NO VAN ROLL%'
then 0
when status in ('PENDING','CANCELLED')
then 0
when ticket_type = 'install'
and appointment_required is true
end as truck_roll
SECOND QUERY
case when status = 'COMPLETED' and resolution not in ('CANCELLING ORDER','CANCEL ORDER')
then 1 else 0 end as completed,
case when status = 'CANCELLED' or ( status in ('COMPLETED','PENDING' ) and resolution_desc in ('CANCELLING ORDER','CANCEL ORDER') ) then 1 else 0 end as cancelled.
I tried the below code for second query but not working:
sparkdf.withColumn('completed', f.when((sparkdf.ticket_status =='COMPLETED') & (~sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))\
.withColumn('cancelled', f.when((sparkdf.ticket_status == 'CANCELLED') | (sparkdf.ticket_status.isin('COMPLETED','PENDING')) & (sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))
df.createOrReplaceTempView('df_view'), to use sql use:df = spark.sql('''your_sql_query from df_view''')