If you're on spark 3.1+, you can use the transform function with withField within a lambda function.
spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')
data_sdf. \
withColumn('parent_col_new',
func.transform('parent_col',
lambda x: x.withField('col_ts',
func.to_timestamp(x.col_ts, "yyyy-MM-dd'T'HH:mm:ss")
)
)
). \
show(truncate=False)
# +---+---------------------------------------------------------------------------------------+-------------------------------------------------------------------------+
# |id |parent_col |parent_col_new |
# +---+---------------------------------------------------------------------------------------+-------------------------------------------------------------------------+
# |1 |[{test, testB, 2020-08-17T03:28:44.986000}, {UNREAD, USER, 2020-08-17T03:28:44.986000}]|[{test, testB, 2020-08-17 03:28:44}, {UNREAD, USER, 2020-08-17 03:28:44}]|
# +---+---------------------------------------------------------------------------------------+-------------------------------------------------------------------------+
# root
# |-- id: integer (nullable = false)
# |-- parent_col: array (nullable = false)
# | |-- element: struct (containsNull = false)
# | | |-- col_a: string (nullable = true)
# | | |-- col_b: string (nullable = true)
# | | |-- col_ts: string (nullable = true)
# |-- parent_col_new: array (nullable = false)
# | |-- element: struct (containsNull = false)
# | | |-- col_a: string (nullable = true)
# | | |-- col_b: string (nullable = true)
# | | |-- col_ts: timestamp (nullable = true)
If withField and/or transform isn't available in your spark version, you can use expr and recreate the struct. It'll result in the same output.
data_sdf. \
withColumn('parent_col_new',
func.expr('''
transform(parent_col,
x -> struct(x.col_a as col_a,
x.col_b as col_b,
to_timestamp(x.col_ts, "yyyy-MM-dd'T'HH:mm:ss") as col_ts
)
)
''')
). \
show(truncate=False)