0

Problem Statement:

I need to collect the logs from a file only after a particular time which can be in the following format 'Aug 7 11:00:00.000' or 'Aug 7 11:00:00'. These logs are in a different txt file and are of the format:

Aug  7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K
Aug  7 11:00:00.000  abc xyz lol

and so on.

How do I extract this particular date and time using regex and then collect the logs only after a specified time ? Is there any other better way to use other than regex.

Right now using this:

import re

monthnames = "(?:Jan\w*|Feb\w*|Mar\w*|Apr\w*|May|Jun\w?|Jul\w?|Aug\w*|Sep\w*|Oct\w*|Nov(?:ember)?|Dec\w*)"

pattern1 = re.compile(r"(\d{1,4}[\/\\\-]+\d{1,2}[\/\\\-]+\d{2,4})")

pattern4 = re.compile(r"(?:[\d,. \-]*[,. \-])?%s(?:[\,\.\ \-]+[\d]+[stndrh]*)+[:\d]*[\ ]?(PM)?(AM)?([\ \-\+\d]{4,7}|[UTCESTGMT\ ]{2,4})*"%monthnames, re.I)

patterns = [pattern4, pattern1]

s='Aug 7 11:00:00.000'

for pattern in patterns:
    print re.findall(pattern,s)

But it returns nothing, an empty list !

Need help !

P.S - I can use only traditional libraries in python, because this is an automation script for Junos

3
  • Why not use the strptime function of the built-in datetime module? Commented Aug 7, 2018 at 19:01
  • Can you please give an example ? Commented Aug 7, 2018 at 19:12
  • datetime.strptime('Mon, August 13, 2018', '%a, %B %d, %Y') returns the datetime object corresponding to August 13, 2018. You can learn more by reading the documentation. Commented Aug 7, 2018 at 19:16

2 Answers 2

1

You definitely don't need regex for this - simple split on a whitespace and collecting the first two results should be more than enough, i.e.:

log_lines = ["Aug  7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K",
             "Aug  7 11:00:00.000  abc xyz lol"]  # we'll use a list as an example

for line in log_lines:
    date_string = " ".join(line.split(None, 3)[:-1])
    print(date_string)

# Aug 7 11:00:00
# Aug 7 11:00:00.000

Now, you can use datetime.datetime.strptime() to parse it into a native date-time object, but you might be limited with the formatting (i.e. %b is a month abbreviation only for the current locale and not all platforms/versions support single digit dates) so, given such a simple structure you might want to rebuild your captured date-time strings before getting them to parse to maximize compatibility:

month_abbr = {"jan": 1, "feb": 2, "mar": 3, "apr": 4, "may": 5, "jun": 6,
              "jul": 7, "aug": 8, "sep": 9, "oct": 10, "nov": 11, "dec": 12}

def parse_date(log_line):
    mon, day, tim = line.split(None, 3)[:-1]
    date_string = "{:02} {:02} ".format(month_abbr[mon.lower()], int(day)) + tim
    try:
        return datetime.datetime.strptime(date_string, "%m %d %H:%M:%S.%f")
    except ValueError as e:
        return datetime.datetime.strptime(date_string, "%m %d %H:%M:%S")

log_lines = ["Aug  7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K",
             "Aug  7 11:00:00.000  abc xyz lol"]  # we'll use a list as an example

for line in log_lines:
    date_object = parse_date(line)
    print(date_object)

# 1900-08-07 11:00:00
# 1900-08-07 11:00:00

NOTE: your date-time objects will have 1900 as their year because your logs do not have the year information. The second date-time object contains its microseconds data as well, it just doesn't print out with the default representation of an datetime.datetime object.

You can now compare those date-time objects to other date-time objects and filter out/print/whatever you want to do those lines that match your criteria, e.g. if you want only logs created after Aug 7:

log_lines = ["Aug  7 11:00:00 abc newsyslog[25714]: logfile turned over due to size>1024K",
             "Aug  7 11:00:00.000  abc xyz lol",
             "Aug  8 11:00:00 foo bar"]  # we'll use a list as an example

min_date = datetime.datetime(1900, 8, 8)  # minimum date set to Aug 8

for line in log_lines:
    if parse_date(line) >= min_date:
        print(line)

# Aug  8 11:00:00 foo bar
Sign up to request clarification or add additional context in comments.

3 Comments

Once I extract the time, I need to extract only those logs that were collected after the specified time like once I extract Aug 6 12:45:30.650, suppose the date and time are parsed as Aug 6 13:45: 50.123. I need to extract logs only after this specified time. How do I do that ? Stuck for a long time now !
I need to collect the logs using both date and time as parameter, this is only comparing the date
@P.Saini - datetime.datetime contains a time component, too. For example, if you want to set the minimum date to 8th of August at 8:30am you can use: min_date = datetime.datetime(1900, 8, 8, 8, 30). Check the datetime.datetime() signature to see all of the available components when defining it.
1

I think regex is an overkill for that, I would've extracted the date part, something like:

' '.join(line.split()[0:3])

And use strptime() with the longer format, catch the exception, and try with strptime for the shorter format:

from datetime import datetime

def get_date(date_str):
    try:
        return datetime.strptime(date_str, '%b %d %H:%M:%S.%f')
    except ValueError:
        return datetime.strptime(date_str, '%b %d %H:%M:%S')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.