Spark DataFrame extract value from array with where

Question

I have a dataframe with the following schema:

root
 |-- id: long (nullable = true)
 |-- raw_data: struct (nullable = true)
 |    |-- address_components: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- long_name: string (nullable = true)
 |    |    |    |-- short_name: string (nullable = true)
 |    |    |    |-- types: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)

Example of address_components:

{
   "address_components":[
      {
         "long_name":"Portugal",
         "short_name":"PT",
         "types":[
            "country",
            "political"
         ]
      },
      {
         "long_name":"8200-591",
         "short_name":"8200-591",
         "types":[
            "postal_code"
         ]
      }
   ]
}

I want to create a new root level attribute: Country: string that should contain PT. However, the selection should be based on array_contains(col("types"), "country")

I figured part of it out like this:

df = df.withColumn("country", expr("filter(raw_data.address_components, c -> array_contains(c.types, 'country'))"))
       .withColumn("country", col("country").getItem(0).getItem("long_name"))

is there a smarter/shorter way to do this?

Mazzy · Accepted Answer · 2022-04-20 09:44:42Z

1

I fixed it using expressions in combination with withColumn:

df = df.withColumn("country", expr("filter(raw_data.address_components, c -> array_contains(c.types, 'country'))[0].short_name"))

answered Apr 20, 2022 at 9:44

Mazzy

1,9642 gold badges18 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Spark DataFrame extract value from array with where

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related