I have a dataframe with the following schema using pyspark:
|-- suborders: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- trackingStatusHistory: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- trackingStatusUpdatedAt: string (nullable = true)
| | | | |-- trackingStatus: string (nullable = true)
What I want to do is create a new deliveredat element for each suborders array using conditions.
I need to find the date within the trackingStatusHistory array where trackingStatusHistory.trackingStatus = 'delivered'. If this trackingStatus exists, the new deliveredat element will receive the date in trackingStatusHistory.trackingStatusUpdatedAt. If doesn't exist, receive null.
How can I do this using pyspark?