13
  • DataFrame a = contains column x,y,z,k
  • DataFrame b = contains column x,y,a

    a.join(b,<condition to use in java to use x,y >) ??? 
    

I tried using

a.join(b,a.col("x").equalTo(b.col("x")) && a.col("y").equalTo(b.col("y"),"inner")

But Java is throwing error saying && is not allowed.

2 Answers 2

35

Spark SQL provides a group of methods on Column marked as java_expr_ops which are designed for Java interoperability. It includes and (see also or) method which can be used here:

a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))
Sign up to request clarification or add additional context in comments.

1 Comment

How to make above condition dynamically using java API in case of column number is not fixed. For it could be 2, 4, 3,7 or more..
1

If you want to use Multiple columns for join, you can do something like this:

a.join(b,scalaSeq, joinType)

You can store your columns in Java-List and convert List to Scala seq. Conversion of Java-List to Scala-Seq:

scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();

Example: a = a.join(b, scalaSeq, "inner");

Note: Dynamic columns will be easily supported in this way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.