Create Separate row for Spark DataFrame Array type

Question

i have following spark dataframe schema

root
 |-- UserId: long (nullable = true)
 |-- VisitedCountry: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Name: string (nullable = false
 |    |    |-- Id: long (nullable = false)

i want to convert each VisitedCountry as seprate row within new dataframe

root
 |-- UserId: long (nullable = true)
 |-- CountryName: string (nullable = false)
 |-- CountryId: long (nullable = false)

Possible duplicate of Expand array-of-structs into columns in PySpark — user10938362
– user10938362, Commented Jun 25, 2019 at 9:41
@Arash: The linked question itself (and the answer) actually have the solution you are after. First use explode and then put the values in the struct into their own separate columns. (and no, I'm not the one who downvoted if you were wondering). — Shaido
– Shaido, Commented Jun 25, 2019 at 9:56

Hypnotise · Accepted Answer · 2019-06-25 09:51:48Z

1

you would probably want to use the explode function.

check out https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=explode

i'm not sure how it would work with structs.

answered Jun 25, 2019 at 9:51

Hypnotise

1091 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

pasha701 · Accepted Answer · 2019-06-25 10:22:22Z

1

Explode & select, on Scala:

df.withColumn("exploded", explode($"VisitedCountry"))
  .select($"UserId",
    $"exploded.Name".alias("CountryName"),
    $"exploded.ID".alias("CountryId")
  )

answered Jun 25, 2019 at 10:22

pasha701

7,2171 gold badge17 silver badges22 bronze badges

Collectives™ on Stack Overflow

Create Separate row for Spark DataFrame Array type

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related