9

I am doing join of 2 data frames and select all columns of left frame for example:

val join_df = first_df.join(second_df, first_df("id") === second_df("id") , "left_outer")

in above I want to do select first_df.* .How can I select all columns of one frame in join ?

4 Answers 4

22

With alias:

first_df.alias("fst").join(second_df, Seq("id"), "left_outer").select("fst.*")
Sign up to request clarification or add additional context in comments.

Comments

3

We can also do it with leftsemi join. leftsemi join will select the data from left side dataframe from a joined dataframe.

Here we join two dataframes df1 and df2 based on column col1.

    df1.join(df2, df1.col("col1").equalTo(df2.col("col1")), "leftsemi") 

Comments

3

Suppose you:

  1. Want to use the DataFrame syntax.
  2. Want to select all columns from df1 but only a couple from df2.
  3. This is cumbersome to list out explicitly due to the number of columns in df1.

Then, you might do the following:

val selectColumns = df1.columns.map(df1(_)) ++ Array(df2("field1"), df2("field2"))
df1.join(df2, df1("key") === df2("key")).select(selectColumns:_*)

Comments

0

Just to add one possibility, whithout using alias, I was able to do that in pyspark with

   first_df.join(second_df, "id", "left_outer").select( first_df["*"] )

Not sure if applies here, but hope it helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.