1

I have two set of queries with multiple case statements. I need to achieve the same logic in pyspark. I tried but I'm facing some difficulties with multiple when. Any help would be appreciatable.

FIRST QUERY

case
when appointment_date is null
then 0
when resolution_desc in (
'CSTXCL - OK BY PHONE'
)
or resolution_des ilike '%NO VAN ROLL%'
then 0
when status in ('PENDING','CANCELLED')
then 0
when ticket_type = 'install'
and appointment_required is true
end as truck_roll

SECOND QUERY

case when status = 'COMPLETED'  and resolution not in ('CANCELLING ORDER','CANCEL ORDER')
then 1 else 0 end as completed, 
case when status = 'CANCELLED'  or ( status in ('COMPLETED','PENDING' ) and resolution_desc in ('CANCELLING ORDER','CANCEL ORDER') ) then 1 else 0 end as cancelled.

I tried the below code for second query but not working:

sparkdf.withColumn('completed', f.when((sparkdf.ticket_status =='COMPLETED') & (~sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))\
.withColumn('cancelled', f.when((sparkdf.ticket_status == 'CANCELLED') | (sparkdf.ticket_status.isin('COMPLETED','PENDING')) & (sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))

1
  • You'll have easier time registering your dataframe as sql view and running the same, slightly adjusted sql code against this view. To register use: df.createOrReplaceTempView('df_view'), to use sql use: df = spark.sql('''your_sql_query from df_view''') Commented Dec 20, 2021 at 7:59

1 Answer 1

2

You can make use of "expr" function to execute SQL code (in this case with triple quotes because it is multi-line):

from pyspark.sql.functions import expr

sparkdf.withColumn(
    'completed',
    expr('''
           CASE WHEN status = 'COMPLETED' 
                  AND resolution NOT IN ('CANCELLING ORDER',
                                         'CANCEL ORDER') THEN 1 
                ELSE                                          0 
           END
         '''
        )
)

Of course, you would do the same for the "cancelled" column

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.