I have a dataframe that looks like this:
YEAR MONTH DAY_OF_MONTH DAY_OF_WEEK ORIGIN_CITY_NAME ORIGIN_STATE_ABR DEST_CITY_NAME DEST_STATE_ABR DEP_TIME DEP_DELAY_NEW ARR_TIME ARR_DELAY_NEW CANCELLED AIR_TIME
0 2020 1 1 3 Ontario CA San Francisco CA 1851 41 2053 68 0 74
1 2020 1 1 3 Ontario CA San Francisco CA 1146 0 1318 0 0 71
2 2020 1 1 3 Ontario CA San Jose CA 2016 0 2124 0 0 57
3 2020 1 1 3 Ontario CA San Jose CA 1350 10 1505 10 0 63
4 2020 1 1 3 Ontario CA San Jose CA 916 1 1023 0 0 57
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
607341 2020 1 16 4 Portland ME New York NY 554 0 846 65 0 57
607342 2020 1 17 5 Portland ME New York NY 633 33 804 23 0 69
607343 2020 1 18 6 Portland ME New York NY 657 0 810 0 0 55
607344 2020 1 19 7 Portland ME New York NY 705 5 921 39 0 54
607345 2020 1 20 1 Portland ME New York NY 628 0 741 0 0 52
I am trying to modify columns DEP_TIME and ARR_TIME so that they have the format hh:mm. All values should be treated as strings. There are also null values present in some rows that need to be accounted for. Performance is also of consideration (albeit secondary in relation to solving the actual problem) since I need to change about 10M records total.
The challenge in this problem to me is figuring out how to modify these values iteratively based on a condition while also having access to the original value when replacing it. I simply could not find a solution for that specific problem elsewhere. Most problems are using a known constant to replace.
Thanks for your help.