0

I'm tracing log files for someone and they are a complete mess (no line-breaks and separators). So I did some easy Regex to make the logs tidy. The logging #codes# are now nicely separated in a list and their string attached to it in a sub-dict. It's like this:

Dict [
    0 : [LOGCODE_53 : 'The string etc etc']
]

As this was kind of easy I was purposing to directly add some log-recognition to it too. Now I can match the LOGCODE, but the problem is that the codes aren't complaint to anything and often different LOGCODE's contain the same output-strings.

So I wrote a few REGEX matches to detect what the log is about. My question now is; what is wisdom to detect a big variety of string patterns? There might be around 110 different types of strings and they are so different that it's not possible to "super-match" them. How can I run ~110 REGEXes over a string to find out the string's intend and thus index them in a logical register.

So kind of like; "take this $STRING and test all the $REGEXes in this $LIST and let me know which $REGEX(es) (indexes) matches the string".

My code:

import re

# Open, Read-out and close; Log file
f = open('000000df.log', "rb")
text = f.read()
f.close()

matches = re.findall(r'00([a-zA-Z0-9]{2})::((?:(?!00[a-zA-Z0-9]{2}::).)+)', text)

print 'Matches: ' + str(len(matches))
print '=========================================================================================='

for match in matches:
    submatching = re.findall(r'(.*?)\'s (.*?) connected (.*?) with ZZZ device (.*?)\.', match[1])

    print match[0] + ' >>> ' + match[1]
    print match[0] + ' >>> ' + submatching[0][0] + ', ' + submatching[0][1] + ',',
    print submatching[0][2] + ', ' + submatching[0][3]
6
  • 1
    That code block isn't python. Commented Jan 7, 2013 at 20:47
  • Can you post some more samples of the strings? Commented Jan 7, 2013 at 20:48
  • @Falmarri It was just to give an indication of the list/dict structure. I've added the code. Where the match "submatching" is standing, I actually want to test the string for multiple regex matches and see which return true. Commented Jan 7, 2013 at 20:51
  • are the 110 different types actual words from English language? Commented Jan 7, 2013 at 20:55
  • No they are all very different logs with sometimes technical data too. Server and system log files. Commented Jan 7, 2013 at 20:56

1 Answer 1

2

re.match, re.search and re.findall return None if a particular regex doesn't match, so you could just iterate over your possible regular expressions and test them:

tests = [
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...')
]

for test in tests:
    matches = test.findall(your_string):

    if matches:
        print test, 'works'
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, actually that was my question :). I was just wondering if there was some super-function to that method (maybe more speedy?) as it might concern 10.000's of log entries. But I just benchmarked 2 million entries, and that only took 4 seconds, so I'm not worried about that anymore :P, thank you :).
@Allendar: Make sure to re.compile() your regexes beforehand. It'll speed things up even more.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.