1

I have two DFs

df1:

+---+-----+--------+
|key|price|    date|
+---+-----+--------+
|  1|  1.0|20210101|
|  2|  2.0|20210101|
|  3|  3.0|20210101|
+---+-----+--------+

df2:

+---+-----+
|key|price|
+---+-----+
|  1|  1.1|
|  2|  2.2|
|  3|  3.3|
+---+-----+

I'd like to replace price column values from df1 with price values from df2 where df1.key == df2.key

Expected output:

+---+-----+--------+
|key|price|    date|
+---+-----+--------+
|  1|  1.1|20210101|
|  2|  2.1|20210101|
|  3|  3.3|20210101|
+---+-----+--------+

I've found some solutions in python but I couldn't come up with a working solution in Scala.

1 Answer 1

3

Simply join + drop df1 column price:

val df = df1.join(df2, Seq("key")).drop(df1("price"))

df.show
//+---+-----+--------+
//|key|price|    date|
//+---+-----+--------+
//|  1|  1.1|20210101|
//|  2|  2.2|20210101|
//|  3|  3.3|20210101|
//+---+-----+--------+

Or if you have more entries in df1 and you want to keep their price when there is no match in df2 then use left join + coalesce expression:

val df = df1.join(df2, Seq("key"), "left").select(
  col("key"),
  col("date"),
  coalesce(df2("price"), df1("price")).as("price")
)
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, I've tried your solution, and the order of the columns changed in the output df, i.e.: key, date, price. Is it possible to preserve the order?
@dmgcodevil Hi! just use .select("key", "price", "date") if you want them in that order
This is an easy example. What about variable json input replacing that variable input in a df?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.