Detect time string format in Python?

Question

I have an extremely large dataset with date/time columns with various formats. I have a validation function to detect the possible date/time string formats that can handle handle 24 hour time as well as 12 hour. The seperator is always :. A sample of the is below. However, after profiling my code, it seems this can become a bottleneck and expensive in terms of the execution time. My question is if there is a better way to do this without affecting the performance.

import datetime
def validate_time(time_str: str):
    for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
        try:
            return datetime.datetime.strptime(time_str, time_format)
        except ValueError:
            continue
    return None

print(validate_time(time_str="9:21 PM"))

Could you share what they are? How general-purpose do you want this to be? Should it handle 24 hour time as well as 12 hour? What about a different separator? Please include all constraints and requirements in your question — pho
– pho, Commented May 4, 2022 at 16:21

pho · Accepted Answer · 2022-05-04 17:10:42Z

Instead of trying to parse using every format string, you could split by colons to obtain the segments of your string that denote hours, minutes, and everything that remains. Then you can parse the result depending on the number of values the split returns:

def validate_time_new(time_str: str):
    time_vals = time_str.split(':')
    
    try:
        if len(time_vals) == 1: 
            # No split, so invalid time
            return None
        elif len(time_vals) == 2:
            if time_vals[-1][::-2].lower() in ["am", "pm"]:
                # if last element contains am or pm, try to parse as 12hr time
                return datetime.datetime.strptime(time_str, "%I:%M %p")
            else:
                # try to parse as 24h time
                return datetime.datetime.strptime(time_str, "%H:%M")
        elif len(time_vals) == 3:
            if "." in time_vals[-1]:
                # If the last element has a decimal point, try to parse microseconds
                return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
            else:
                # try to parse without microseconds
                return datetime.datetime.strptime(time_str, "%H:%M:%S")
        else: return None
    except ValueError:
        # If any of the attempts to parse throws an error, return None
        return None

To test this, let's time both methods for a bunch of test strings:

import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
    t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
    t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
    print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")

old         new             old/new     test_string
0.001628    0.001143        1.424322        12:24
0.001567    0.001012        1.548661        12:23:42
0.000935    0.000979        0.955177        13:53
0.003004    0.000722        4.161657        1:53 PM
0.004523    0.001396        3.241204        12:24:43.220
0.002148    0.000025        84.897370       not a date
0.002262    0.000622        3.638629        54:23:21

Collectives™ on Stack Overflow

Detect time string format in Python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related