1

I have two dataframes like below, and I need to merge them based on matching rows.

Dataframe 1

ID status
V1 Low
V2 Low
V3 Low

Dataframe 2

ID status
V1 High
V2 High
V6 High

Expected dataframe like below

ID status
V1 Low
V1 High
V2 Low
V2 High

2 Answers 2

0

(I only know Java, not Scala, sorry)
I would say, if I call :
your dataset 1: A
and dataset 2: B

Column joinClause = A.col("ID").equalTo(B.col("ID"));

Dataset A_with_B = A.join(B, joinClause, "left_semi")
.union(
   B.join(A, joinClause, "left_semi")
);
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you Marc Le Bihan, I have converted to Scala, its working.
0

One option is to do an inner join and take a union of the resulting dataframe


import org.apache.spark.sql.SparkSession

object dev extends App{
  val spark = SparkSession.builder()
    .appName("Join and Stack Example")
    .master("local[*]")
    .getOrCreate()

  import spark.implicits._ // for creating the DataFrames

  val df1 = Seq(
    ("V1", "Low"),
    ("V2", "Low"),
    ("V3", "Low")
  ).toDF("ID", "Status")

  val df2 = Seq(
    ("V1", "High"),
    ("V2", "High"),
    ("V6", "High")
  ).toDF("ID", "Status")

  val joined = df1.as("left")
    .join(df2.as("right"), Seq("ID"), "inner")
    .select(
      $"ID",
      $"left.Status".as("Status_left"),
      $"right.Status".as("Status_right")
    )
  val leftStatus = joined.select($"ID", $"Status_left".as("Status"))
  val rightStatus = joined.select($"ID", $"Status_right".as("Status"))
  val stacked = leftStatus.union(rightStatus)

  // optionally sort if you want the exact same output as you had
  stacked.sort($"ID").show()
}

1 Comment

Thank you @Vilhomaa, I tried mentioned logic and its working fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.