0

I have the following situation: I have a dataframe with an 'id' and 'array' as the schema. Now I want to get for each array, all lists of pairs with the corresponding id and save it again in a dataframe. So for example:

This is the original dataframe:

+---+----------+
| id|candidates|
+---+----------+
|  1|    [2, 3]|
|  2|       [3]|
+---+----------+

And that is how it have to look like after the computation:

+---+---+
|id1|id2|
+---+---+
|  1|  2|
|  1|  3|
|  2|  3|
+---+---+

Maybe someone has an idea for this problem?

2
  • Simply use the explode function Commented Nov 6, 2018 at 15:48
  • And how I can use it for all array elements? Commented Nov 6, 2018 at 15:52

2 Answers 2

1

Ok, thanks @cheseaux I found the answer! There is the simply explode_outer function:

    candidatesDF.withColumn("candidates", explode_outer($"candidates")).show
Sign up to request clarification or add additional context in comments.

1 Comment

Use explode if you don't want any row when the array is null/empty. if you use explode_outer it will in that case produce null, which might not be desired.
0

Simply explode the array column.

candidatesDF.withColumn("id2", explode('candidates))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.