Explode column with array of arrays - PySpark

Question

I have a column with data like this:

[[[-77.1082606, 38.935738]] ,Point]

I want it split out like:

  column 1          column 2        column 3
 -77.1082606      38.935738           Point

How is that possible using PySpark, or alternatively Scala (Databricks 3.0)? I know how to explode columns but not split up these structs. Thanks!!!

EDIT: Here is the schema of the column:

|-- geometry: struct (nullable = true)
 |    |-- coordinates: string (nullable = false)
 |    |-- type: string (nullable = false

What's the type? array<array<>>? Please post result of printSchema — T. Gawęda
– T. Gawęda, Commented Sep 20, 2017 at 20:24

mtoto · Accepted Answer · 2017-09-21 16:49:47Z

4

You can use regexp_replace() to get rid of the square brackets, and then split() the resulting string by the comma into separate columns.

from pyspark.sql.functions import regexp_replace, split, col

df.select(regexp_replace(df.geometry.coordinates, "[\[\]]", "").alias("coordinates"),
          df.geometry.type.alias("col3")) \
  .withColumn("arr", split(col("coordinates"), "\\,")) \
  .select(col("arr")[0].alias("col1"),
          col("arr")[1].alias("col2"),
         "col3") \
  .drop("arr") \
  .show(truncate = False)
+-----------+----------+-----+
|col1       |col2      |col3 |
+-----------+----------+-----+
|-77.1082606| 38.935738|Point|
+-----------+----------+-----+

edited Sep 21, 2017 at 16:49

answered Sep 20, 2017 at 20:34

mtoto

24.3k4 gold badges62 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

T. Gawęda Over a year ago

I couldn't recall the syntax - you were faster :D +1 and I suggest @AshleyO to also give +1 and accept :)

Ashley O Over a year ago

I should have been more clear, the data is all in one struct. I've edited to display the information more clearly. I'm testing to see if this concept can help though

mtoto Over a year ago

so you have ["[[-77.1082606, 38.935738]]" ,"Point"] ?

Ashley O Over a year ago

Correct. All in one column

Collectives™ on Stack Overflow

Explode column with array of arrays - PySpark

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related