0

I have a dataframe that looks like this:

        YEAR  MONTH  DAY_OF_MONTH  DAY_OF_WEEK ORIGIN_CITY_NAME ORIGIN_STATE_ABR DEST_CITY_NAME DEST_STATE_ABR DEP_TIME DEP_DELAY_NEW ARR_TIME ARR_DELAY_NEW CANCELLED AIR_TIME
0       2020      1             1            3          Ontario               CA  San Francisco             CA     1851            41     2053            68         0       74
1       2020      1             1            3          Ontario               CA  San Francisco             CA     1146             0     1318             0         0       71
2       2020      1             1            3          Ontario               CA       San Jose             CA     2016             0     2124             0         0       57
3       2020      1             1            3          Ontario               CA       San Jose             CA     1350            10     1505            10         0       63
4       2020      1             1            3          Ontario               CA       San Jose             CA      916             1     1023             0         0       57
...      ...    ...           ...          ...              ...              ...            ...            ...      ...           ...      ...           ...       ...      ...
607341  2020      1            16            4         Portland               ME       New York             NY      554             0      846            65         0       57
607342  2020      1            17            5         Portland               ME       New York             NY      633            33      804            23         0       69
607343  2020      1            18            6         Portland               ME       New York             NY      657             0      810             0         0       55
607344  2020      1            19            7         Portland               ME       New York             NY      705             5      921            39         0       54
607345  2020      1            20            1         Portland               ME       New York             NY      628             0      741             0         0       52

I am trying to modify columns DEP_TIME and ARR_TIME so that they have the format hh:mm. All values should be treated as strings. There are also null values present in some rows that need to be accounted for. Performance is also of consideration (albeit secondary in relation to solving the actual problem) since I need to change about 10M records total.

The challenge in this problem to me is figuring out how to modify these values iteratively based on a condition while also having access to the original value when replacing it. I simply could not find a solution for that specific problem elsewhere. Most problems are using a known constant to replace.

Thanks for your help.

2
  • Does this answer your question? stackoverflow.com/q/42529454/4037715 Commented Feb 6, 2021 at 10:34
  • In this case I would have to convert the targeted strings to a time object and back to string again since I want the format hh:mm in my dataframe. Seems complicated Commented Feb 6, 2021 at 10:48

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.