0

I would like to merge 2 spark dataframes (scala). The first data frame contains only 1 row. The second dataframe has multiple rows. I would like to merge these and copy the address / phone column values in the first dataframe to all the rows in second dataframe. Is there a way do it using Spark operations?

DF1

name age address phone
ABC  25  XYZ     00000

DF2

    name   age

    Bill   30
    Steve  40
    Jackie 50

Final DF

name  age address phone
ABC    25  XYZ     00000
Bill   30  XYZ     00000
Steve  40  XYZ     00000
Jackie 50  XYZ     00000

2 Answers 2

1

There is a simple way to do it:

import org.apache.spark.sql.functions.lit

val row = df1.select("address", "phone").collect()(0)
val finalDF = df2.withColumn("address", lit(row(0)))
       .withColumn("phone", lit(row(1))).union(df1)
Sign up to request clarification or add additional context in comments.

Comments

0

You can use simple left .join by name with df2 on the left side (with age which you get from df1):

val results = df2.select("name", "address", "phone")
              .join(df1, Seq("name"), "left")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.