How to merge two dataframes based on matching rows using spark scala

Question

I have two dataframes like below, and I need to merge them based on matching rows.

Dataframe 1

ID	status
V1	Low
V2	Low
V3	Low

Dataframe 2

ID	status
V1	High
V2	High
V6	High

Expected dataframe like below

ID	status
V1	Low
V1	High
V2	Low
V2	High

Marc Le Bihan · Accepted Answer · 2025-06-04 11:50:33Z

0

(I only know Java, not Scala, sorry)
I would say, if I call :
your dataset 1: A
and dataset 2: B

Column joinClause = A.col("ID").equalTo(B.col("ID"));

Dataset A_with_B = A.join(B, joinClause, "left_semi")
.union(
   B.join(A, joinClause, "left_semi")
);

answered Jun 4 at 11:50

Marc Le Bihan

3,5816 gold badges35 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

RMK Jun 5 at 8:17

Thank you Marc Le Bihan, I have converted to Scala, its working.

Vilhomaa · Accepted Answer · 2025-06-04 19:23:34Z

0

One option is to do an inner join and take a union of the resulting dataframe


import org.apache.spark.sql.SparkSession

object dev extends App{
  val spark = SparkSession.builder()
    .appName("Join and Stack Example")
    .master("local[*]")
    .getOrCreate()

  import spark.implicits._ // for creating the DataFrames

  val df1 = Seq(
    ("V1", "Low"),
    ("V2", "Low"),
    ("V3", "Low")
  ).toDF("ID", "Status")

  val df2 = Seq(
    ("V1", "High"),
    ("V2", "High"),
    ("V6", "High")
  ).toDF("ID", "Status")

  val joined = df1.as("left")
    .join(df2.as("right"), Seq("ID"), "inner")
    .select(
      $"ID",
      $"left.Status".as("Status_left"),
      $"right.Status".as("Status_right")
    )
  val leftStatus = joined.select($"ID", $"Status_left".as("Status"))
  val rightStatus = joined.select($"ID", $"Status_right".as("Status"))
  val stacked = leftStatus.union(rightStatus)

  // optionally sort if you want the exact same output as you had
  stacked.sort($"ID").show()
}

answered Jun 4 at 19:23

Vilhomaa

11 bronze badge

1 Comment

RMK Jun 5 at 8:18

Thank you @Vilhomaa, I tried mentioned logic and its working fine.

Collectives™ on Stack Overflow

How to merge two dataframes based on matching rows using spark scala

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related