You definitely don't need regex for this - simple split on a whitespace and collecting the first two results should be more than enough, i.e.:
log_lines = ["Aug 7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K",
"Aug 7 11:00:00.000 abc xyz lol"] # we'll use a list as an example
for line in log_lines:
date_string = " ".join(line.split(None, 3)[:-1])
print(date_string)
# Aug 7 11:00:00
# Aug 7 11:00:00.000
Now, you can use datetime.datetime.strptime() to parse it into a native date-time object, but you might be limited with the formatting (i.e. %b is a month abbreviation only for the current locale and not all platforms/versions support single digit dates) so, given such a simple structure you might want to rebuild your captured date-time strings before getting them to parse to maximize compatibility:
month_abbr = {"jan": 1, "feb": 2, "mar": 3, "apr": 4, "may": 5, "jun": 6,
"jul": 7, "aug": 8, "sep": 9, "oct": 10, "nov": 11, "dec": 12}
def parse_date(log_line):
mon, day, tim = line.split(None, 3)[:-1]
date_string = "{:02} {:02} ".format(month_abbr[mon.lower()], int(day)) + tim
try:
return datetime.datetime.strptime(date_string, "%m %d %H:%M:%S.%f")
except ValueError as e:
return datetime.datetime.strptime(date_string, "%m %d %H:%M:%S")
log_lines = ["Aug 7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K",
"Aug 7 11:00:00.000 abc xyz lol"] # we'll use a list as an example
for line in log_lines:
date_object = parse_date(line)
print(date_object)
# 1900-08-07 11:00:00
# 1900-08-07 11:00:00
NOTE: your date-time objects will have 1900 as their year because your logs do not have the year information. The second date-time object contains its microseconds data as well, it just doesn't print out with the default representation of an datetime.datetime object.
You can now compare those date-time objects to other date-time objects and filter out/print/whatever you want to do those lines that match your criteria, e.g. if you want only logs created after Aug 7:
log_lines = ["Aug 7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K",
"Aug 7 11:00:00.000 abc xyz lol",
"Aug 8 11:00:00 foo bar"] # we'll use a list as an example
min_date = datetime.datetime(1900, 8, 8) # minimum date set to Aug 8
for line in log_lines:
if parse_date(line) >= min_date:
print(line)
# Aug 8 11:00:00 foo bar
strptimefunction of the built-indatetimemodule?datetime.strptime('Mon, August 13, 2018', '%a, %B %d, %Y')returns thedatetimeobject corresponding to August 13, 2018. You can learn more by reading the documentation.