2

I have 2 dataset namely Distance and Customer, want to find out id in Customer dataset is present in id_5 of Distance dataset where the id_5 is Array of id's. Your help is greatly appreciated.

case class Distance(zip: String, id_5: Array[Int])
val dist = Seq(Distance("72712",Array(72713,72714,72715)))
val distDS=dist.toDS()

case class Customer (cust_id: Int, id: String)
val c = Seq(Customer(1,"72713"),Customer(2,"72714"),Customer(3,"72720"))
val custDS = c.toDS()

val res = distDS.joinWith(custDS,distDS.col("id_5"(??????)) === custDS.col("id"))`

1 Answer 1

1

Use array_contains:

import org.apache.spark.sql.functions.expr

distDS.joinWith(custDS, expr("array_contains(id_5, cust_id)"))
Sign up to request clarification or add additional context in comments.

1 Comment

Mark it as accepted or useful if you find it useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.