1

I have dataframe like below

id  contact_persons
-----------------------
1   [[abc, [email protected], 896676, manager],[pqr, [email protected], 89809043, director],[stu, [email protected], 09909343, programmer]]    

schema looks like this.

root
 |-- id: string (nullable = true)
 |-- contact_persons: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)

i need to convert this dataframe like below schema.

 root
 |-- id: string (nullable = true)
 |-- contact_persons: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- emails: string (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- phone: string (nullable = true)
 |    |    |-- roles: string (nullable = true)

I know there is struct function in pyspark, but in this scenario, i dont know how to use this as array is dynamic sized.

1 Answer 1

2

You can use TRANSFORM expression to cast it:

import pyspark.sql.functions as f

df = spark.createDataFrame([
  [1, [['abc', '[email protected]', '896676', 'manager'],
       ['pqr', '[email protected]', '89809043', 'director'],
       ['stu', '[email protected]', '09909343', 'programmer']]]
], schema='id string, contact_persons array<array<string>>')

expression = 'TRANSFORM(contact_persons, el -> STRUCT(el[0] AS name, el[1] AS emails, el[2] AS phone, el[3] AS roles))'
output_df = df.withColumn('contact_persons', f.expr(expression))

# output_df.printSchema()
# root
#  |-- id: string (nullable = true)
#  |-- contact_persons: array (nullable = true)
#  |    |-- element: struct (containsNull = false)
#  |    |    |-- name: string (nullable = true)
#  |    |    |-- emails: string (nullable = true)
#  |    |    |-- phone: string (nullable = true)
#  |    |    |-- roles: string (nullable = true)

output_df.show(truncate=False)
+---+-----------------------------------------------------------------------------------------------------------------------+
|id |contact_persons                                                                                                        |
+---+-----------------------------------------------------------------------------------------------------------------------+
|1  |[{abc, [email protected], 896676, manager}, {pqr, [email protected], 89809043, director}, {stu, [email protected], 09909343, programmer}]|
+---+-----------------------------------------------------------------------------------------------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.