how to explode multiple arrays using python

Question

I have a dataframe which consists lists in columns similar to the following. The length of the lists in all columns is not same.

Name Age Subjects Grades [Bob] [16] [Maths,Physics,Chemistry] [A,B,C] I want to explode the dataframe in such a way that i get the following output-

Name Age Subjects Grades Bob 16 Maths A Bob 16 Physics B Bob 16 Chemistry C

I tried using selectExpr along with inline , array_zip etc.

Martynas · Accepted Answer · 2023-01-25 10:16:11Z

You can first array_zip and then explode

Sample:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame(
    [
        ('Bob', 16, ["Maths", "Physics", "Chemistry"], ["A", "B", "C"]),
        ('Alice', 17, ["Maths", "Physics"], ["A", "B"])
    ],
    ["name", "age", "subjects", "grades"]
)

df.show(truncate=False)

exploded = (
    df
        .withColumn(
            "exploded",
             F.explode(F.arrays_zip(F.col("subjects"), F.col("grades")))
        )
        .select(
            F.col("name"),
            F.col("age"),
            F.col("exploded.subjects").alias("subject"),
            F.col("exploded.grades").alias("grade"),
        )
)

exploded.show(truncate=False)

Output:

+-----+---+---------------------------+---------+                               
|name |age|subjects                   |grades   |
+-----+---+---------------------------+---------+
|Bob  |16 |[Maths, Physics, Chemistry]|[A, B, C]|
|Alice|17 |[Maths, Physics]           |[A, B]   |
+-----+---+---------------------------+---------+

+-----+---+---------+-----+
|name |age|subject  |grade|
+-----+---+---------+-----+
|Bob  |16 |Maths    |A    |
|Bob  |16 |Physics  |B    |
|Bob  |16 |Chemistry|C    |
|Alice|17 |Maths    |A    |
|Alice|17 |Physics  |B    |
+-----+---+---------+-----+

The answer assumes that grades and subjects are of same length in a row, if not some extra handling is needed

Collectives™ on Stack Overflow

how to explode multiple arrays using python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related