Pyspark: Create an array of struct from another array of struct

Question

I'm using Pyspark 2.4 and would like to create df_2 from df_1:

df_1:

root
 |-- request: array (nullable = false)
 |    |-- address: struct (nullable = false)
 |    |    |-- street: string (nullable  = false)
 |    |    |-- postcode: string (nullable  = false)

df_2:

root
 |-- request: array (nullable = false)
 |    |-- address: struct (nullable = false)
 |    |    |-- street: string (nullable  = false)

I know UDF is one way, but are there any other ways, like the use of map(), to achieve the same goal?

blackbishop · Accepted Answer · 2020-02-19 12:16:51Z

1

Use transform function :

df_2 = df_1.withColumn("request", expr("transform(request, x -> struct(x.street) as address)"))

For each element of request array, we select only street field and create a new struct.

answered Feb 19, 2020 at 12:16

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Blue Clouds Over a year ago

AnalysisException: cannot resolve 'struct(namedlambdavariable().street )' due to data type mismatch: Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder;

Collectives™ on Stack Overflow

Pyspark: Create an array of struct from another array of struct

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related