0

Hi all I have 2 Dataframes and I'm applying some join condition on those dataframes. 1.after join condition i want all the data from first dataframe whose name,id,code,lastname is not matching which second dataframe.I have written below code.

    val df3=df1.join(df2,df1("name") !==  df2("name_2")  && 
    df1("id") !== df2("id_2") &&
    df1("code") !==  df2("code_2") && 
    df1("lastname") !==  df2("lastname_2"),"inner")
    .drop(df2("id_2"))
    .drop(df2("name_2"))
    .drop(df2("code_2"))
    .drop(df2("lastname"))

expected result.

    DF1
    id,name,code,lastname
    1,A,001,p1
    2,B,002,p2
    3,C,003,p3

    DF2
    id_2,name_2,code_2,lastname_2
    1,A,001,p1
    2,B,002,p4
    4,D,004,p4


    DF3
    id,name,code,lastname
    3,C,003,p3

Can someone please help me is this the correct way to do this or Should I use sql inner query with 'not In '?. I am new to spark and using first time dataframe methods so I am not sure this is the correct way or not?

1
  • you can use except to get records from df1 that doesn't have in df2. df1.except(df2) Commented May 18, 2020 at 20:14

1 Answer 1

1

I recommend you using Spark API to work with data:

        val df1 =
          Seq((1, "20181231"), (2, "20190102"), (3, "20190103"), (4, "20190104"), (5, "20190105")).toDF("id", "date")

        val df2 =
          Seq((1, "20181231"), (2, "20190102"), (4, "20190104"), (5, "20190105")).toDF("id", "date")

Option1. You can get all rows are not included in other dataframe:

    val df3=df1.except(df2)

Option2. You can use a specific fields to do anti join, for example 'id':

    val df3 = df1.as("table1").join(df2.as("table2"), $"table1.id" === $"table2.id", "leftanti")

    df3.show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.