-2

i have following spark dataframe schema

root
 |-- UserId: long (nullable = true)
 |-- VisitedCountry: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Name: string (nullable = false
 |    |    |-- Id: long (nullable = false)

i want to convert each VisitedCountry as seprate row within new dataframe

root
 |-- UserId: long (nullable = true)
 |-- CountryName: string (nullable = false)
 |-- CountryId: long (nullable = false)
3
  • 2
    Possible duplicate of Expand array-of-structs into columns in PySpark Commented Jun 25, 2019 at 9:41
  • @user10938362 provided reference doesn't have any answers . Commented Jun 25, 2019 at 9:42
  • 2
    @Arash: The linked question itself (and the answer) actually have the solution you are after. First use explode and then put the values in the struct into their own separate columns. (and no, I'm not the one who downvoted if you were wondering). Commented Jun 25, 2019 at 9:56

2 Answers 2

1

you would probably want to use the explode function.

check out https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=explode

i'm not sure how it would work with structs.

Sign up to request clarification or add additional context in comments.

Comments

1

Explode & select, on Scala:

df.withColumn("exploded", explode($"VisitedCountry"))
  .select($"UserId",
    $"exploded.Name".alias("CountryName"),
    $"exploded.ID".alias("CountryId")
  )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.