1

How can i use case condition while joining two dataframes in spark.

    var date_a = s"CASE WHEN month(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))) 
    IN  (01,02,03) THEN CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))-1,'-')
    ,substr(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))),3,4)) 
    ELSE CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))),'-'),
    SUBSTR(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))+1,3,4)) END"

    val gstr1_amend = df1.join(gstr1_amend_lkup_data, df1("date_b") === df2(date_a))

But am getting error case is not a column.

1
  • Are you still facing this issue ? Commented May 21, 2020 at 10:31

2 Answers 2

1

I had a similar situation with a minor diff, I wanted to use column from second data frame in case when column from first column is blank, and this is to be done only on joining. Couldn't use a case, however joined on another key column and used case in filter. Isn't elegant solution, but works.

Sign up to request clarification or add additional context in comments.

Comments

0

Instead of adding case statement in joining condition, add all conditions using when & otherwise functions inside withColumn and then use same column in join condition like below.

val df2 = somedf
.withColumn("date_a",when([...]).otherwise([...])) // [...] is your case statement logic

val gstr1_amend = df1.join(df2, df1("date_b") === df2("date_a"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.