I'm tracing log files for someone and they are a complete mess (no line-breaks and separators). So I did some easy Regex to make the logs tidy. The logging #codes# are now nicely separated in a list and their string attached to it in a sub-dict. It's like this:
Dict [
0 : [LOGCODE_53 : 'The string etc etc']
]
As this was kind of easy I was purposing to directly add some log-recognition to it too. Now I can match the LOGCODE, but the problem is that the codes aren't complaint to anything and often different LOGCODE's contain the same output-strings.
So I wrote a few REGEX matches to detect what the log is about. My question now is; what is wisdom to detect a big variety of string patterns? There might be around 110 different types of strings and they are so different that it's not possible to "super-match" them. How can I run ~110 REGEXes over a string to find out the string's intend and thus index them in a logical register.
So kind of like; "take this $STRING and test all the $REGEXes in this $LIST and let me know which $REGEX(es) (indexes) matches the string".
My code:
import re
# Open, Read-out and close; Log file
f = open('000000df.log', "rb")
text = f.read()
f.close()
matches = re.findall(r'00([a-zA-Z0-9]{2})::((?:(?!00[a-zA-Z0-9]{2}::).)+)', text)
print 'Matches: ' + str(len(matches))
print '=========================================================================================='
for match in matches:
submatching = re.findall(r'(.*?)\'s (.*?) connected (.*?) with ZZZ device (.*?)\.', match[1])
print match[0] + ' >>> ' + match[1]
print match[0] + ' >>> ' + submatching[0][0] + ', ' + submatching[0][1] + ',',
print submatching[0][2] + ', ' + submatching[0][3]