2

Assuming I am having the following dataframes:

val df1 = sc.parallelize(Seq("a1" -> "a2", "b1" -> "b2", "c1" -> "c2")).toDF("a", "b")
val df2 = sc.parallelize(Seq("aa1" -> "aa2", "bb1" -> "bb2")).toDF("aa", "bb")

And I want the following:

 | a  | b  | aa  | bb  |
 ----------------------
 | a1 | a2 | aa1 | aa2 |
 | a1 | a2 | bb1 | bb2 |
 | b1 | b2 | aa1 | aa2 |
 | b1 | b2 | bb1 | bb2 |
 | c1 | c2 | aa1 | aa2 |
 | c1 | c2 | bb1 | bb2 |

So each row of df1 to map to all of the rows of df2. The way I am doing it is the following:

val df1_dummy = df1.withColumn("dummy_df1", lit("dummy"))
val df2_dummy = df2.withColumn("dummy_df2", lit("dummy"))
val desired_result = df1_dummy
                       .join(df2_dummy, $"dummy_df1" === $"dummy_df2", "left")
                       .drop("dummy_df1")
                       .drop("dummy_df2")

It gives the desired results but it seems a bit of a bad way. Is there a more efficient way of doing that? any recommendation?

1 Answer 1

7

That's what crossJoin is for:

val result = df1.crossJoin(df2)

result.show()
// +---+---+---+---+
// |a  |b  |aa |bb |
// +---+---+---+---+
// |a1 |a2 |aa1|aa2|
// |a1 |a2 |bb1|bb2|
// |b1 |b2 |aa1|aa2|
// |b1 |b2 |bb1|bb2|
// |c1 |c2 |aa1|aa2|
// |c1 |c2 |bb1|bb2|
// +---+---+---+---+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.