You can first array_zip and then explode
Sample:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(
[
('Bob', 16, ["Maths", "Physics", "Chemistry"], ["A", "B", "C"]),
('Alice', 17, ["Maths", "Physics"], ["A", "B"])
],
["name", "age", "subjects", "grades"]
)
df.show(truncate=False)
exploded = (
df
.withColumn(
"exploded",
F.explode(F.arrays_zip(F.col("subjects"), F.col("grades")))
)
.select(
F.col("name"),
F.col("age"),
F.col("exploded.subjects").alias("subject"),
F.col("exploded.grades").alias("grade"),
)
)
exploded.show(truncate=False)
Output:
+-----+---+---------------------------+---------+
|name |age|subjects |grades |
+-----+---+---------------------------+---------+
|Bob |16 |[Maths, Physics, Chemistry]|[A, B, C]|
|Alice|17 |[Maths, Physics] |[A, B] |
+-----+---+---------------------------+---------+
+-----+---+---------+-----+
|name |age|subject |grade|
+-----+---+---------+-----+
|Bob |16 |Maths |A |
|Bob |16 |Physics |B |
|Bob |16 |Chemistry|C |
|Alice|17 |Maths |A |
|Alice|17 |Physics |B |
+-----+---+---------+-----+
The answer assumes that grades and subjects are of same length in a row, if not some extra handling is needed