I'm currently parsing a log file that has the following structure:
1) timestamp, preceded by # character and followed by \n
2) arbitrary # of events that happened after that timestamp and all followed by \n
3) repeat..
Here is an exmaple:
#100
04!
03!
02!
#1299
0L
0K
0J
0E
#1335
06!
0X#
0[#
b1010 Z$
b1x [$
...
Please forgive the seemingly cryptic values, they are encodings representing certain "events".
Note: Event encodings may also use the # character.
What I am trying to do is to count the number of events that happen at a certain time.
In other words, at time 100, 3 events happened.
I am trying to match all text between two timestamps - and count the number of events by simply counting the number of newlines enclosed in the matched text.
I'm using Python's regex engine, and I'm using the following expression:
pattern = re.compile('(#[0-9]{2,}.*)(?!#[0-9]+)')
Note: The {2,} is because I want timestamps with at least two digits.
I match a timestamp, continue matching any other characters until hitting another timestamp - ending the matching.
What this returns is:
#100
#1299
#1335
So, I get the timestamps - but none of the events data - what I really care about!
I'm thinking the reason for this is that the negative-lookbehind is "greedy" - but I'm not completely sure.
There may be an entirely different regex that makes this much simpler - open to any suggestions!
Any help is much appreciated!
-k