3

So I have a question regarding pyspark. I have a dataframe that looks like this:

+---+------------+
| id|        list|
+---+------------+
|  2|[3, 5, 4, 2]|
+---+------------+
|  3|[4, 5, 3, 2]|
+---+------------+

And I would like to explode lists it into multiple rows and keeping information about which position did each element of the list had in a separate column. The result should look like this:

+---+------------+------------+
| id|    listitem|        rank|
+---+------------+------------+
|  2|           3|           1|
+---+------------+------------+
|  2|           5|           2|
+---+------------+------------+
|  2|           4|           3|
+---+------------+------------+
|  2|           2|           4|
+---+------------+------------+
|  3|           4|           1|
+---+------------+------------+
|  3|           5|           2|
+---+------------+------------+
|  3|           3|           3|
+---+------------+------------+
|  3|           2|           4|
+---+------------+------------+

The rank column has the index+1 of the position each element had in the list. Any suggestions on the most optimal code to achieve it?

1 Answer 1

5

You can use posexplode() or posexplode_outer() function to get desired result.

df = spark.createDataFrame([(2, [3, 5, 4, 2]), (3, [4, 5, 3, 2])], ["id", "list"])

df.select('id',posexplode_outer('list').alias('rank', 'listitem')) \
.withColumn('rank', col('rank') + 1).show()

+---+----+--------+
| id|rank|listitem|
+---+----+--------+
|  2|   1|       3|
|  2|   2|       5|
|  2|   3|       4|
|  2|   4|       2|
|  3|   1|       4|
|  3|   2|       5|
|  3|   3|       3|
|  3|   4|       2|
+---+----+--------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.