0

I have

Dataset A: uuid, listOfLocationsIds, name

Dataset B: locationId, latitude, longitude

A.listOfLocationIds can have multiple locationIds

How can I do a join on A and B with each value in listOfLocationsIds? So if there are two values in listOfLocationIds, I would want the join to consider each locationId in the listOfLocationIds

A.join(B, A.listOfLocationsIds[0] == B.locationId, "left")
A.join(B, A.listOfLocationsIds[1] == B.locationId, "left")

1 Answer 1

1

Assume dataset A is called df with this content:

+----+-----------------+-----+
|uuid|listOfLocationsId|name |
+----+-----------------+-----+
|1   |[1, 2, 3]        |name1|
|2   |[1, 3]           |name1|
+----+-----------------+-----+

and dataset B is called df2 with this content:

+----------+--------+---------+
|locationId|latitude|longitude|
+----------+--------+---------+
|2         |5       |7        |
+----------+--------+---------+

And we do an array_contains join:

df = df.join(df2, 
  array_contains(col("listOfLocationsId"), col("locationId")), "left"
)

The final result:

+----+-----------------+-----+----------+--------+---------+
|uuid|listOfLocationsId|name |locationId|latitude|longitude|
+----+-----------------+-----+----------+--------+---------+
|1   |[1, 2, 3]        |name1|2         |5       |7        |
|2   |[1, 3]           |name1|null      |null    |null     |
+----+-----------------+-----+----------+--------+---------+

Good luck!

Sign up to request clarification or add additional context in comments.

2 Comments

what if the list can have unknown number of elements?
edited the comment, guess this is what you are looking for

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.