Check strings in a for loop for multiple regexs

Question

I'm tracing log files for someone and they are a complete mess (no line-breaks and separators). So I did some easy Regex to make the logs tidy. The logging #codes# are now nicely separated in a list and their string attached to it in a sub-dict. It's like this:

Dict [
    0 : [LOGCODE_53 : 'The string etc etc']
]

As this was kind of easy I was purposing to directly add some log-recognition to it too. Now I can match the LOGCODE, but the problem is that the codes aren't complaint to anything and often different LOGCODE's contain the same output-strings.

So I wrote a few REGEX matches to detect what the log is about. My question now is; what is wisdom to detect a big variety of string patterns? There might be around 110 different types of strings and they are so different that it's not possible to "super-match" them. How can I run ~110 REGEXes over a string to find out the string's intend and thus index them in a logical register.

So kind of like; "take this $STRING and test all the $REGEXes in this $LIST and let me know which $REGEX(es) (indexes) matches the string".

My code:

import re

# Open, Read-out and close; Log file
f = open('000000df.log', "rb")
text = f.read()
f.close()

matches = re.findall(r'00([a-zA-Z0-9]{2})::((?:(?!00[a-zA-Z0-9]{2}::).)+)', text)

print 'Matches: ' + str(len(matches))
print '=========================================================================================='

for match in matches:
    submatching = re.findall(r'(.*?)\'s (.*?) connected (.*?) with ZZZ device (.*?)\.', match[1])

    print match[0] + ' >>> ' + match[1]
    print match[0] + ' >>> ' + submatching[0][0] + ', ' + submatching[0][1] + ',',
    print submatching[0][2] + ', ' + submatching[0][3]

@Falmarri It was just to give an indication of the list/dict structure. I've added the code. Where the match "submatching" is standing, I actually want to test the string for multiple regex matches and see which return true. — user1467267
– user1467267, Commented Jan 7, 2013 at 20:51
are the 110 different types actual words from English language? — tzelleke
– tzelleke, Commented Jan 7, 2013 at 20:55
No they are all very different logs with sometimes technical data too. Server and system log files. — user1467267
– user1467267, Commented Jan 7, 2013 at 20:56

Blender · Accepted Answer · 2013-01-07 21:01:41Z

2

re.match, re.search and re.findall return None if a particular regex doesn't match, so you could just iterate over your possible regular expressions and test them:

tests = [
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...')
]

for test in tests:
    matches = test.findall(your_string):

    if matches:
        print test, 'works'

answered Jan 7, 2013 at 21:01

Blender

300k55 gold badges462 silver badges511 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1467267 Over a year ago

Yes, actually that was my question :). I was just wondering if there was some super-function to that method (maybe more speedy?) as it might concern 10.000's of log entries. But I just benchmarked 2 million entries, and that only took 4 seconds, so I'm not worried about that anymore :P, thank you :).

Blender Over a year ago

@Allendar: Make sure to re.compile() your regexes beforehand. It'll speed things up even more.

Collectives™ on Stack Overflow

Check strings in a for loop for multiple regexs

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related