1

I have a log file that I am trying to parse. Example of log file is below:

Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)

I want to pull out all the text that start with a hash, and have a key and value. For example, #msgtype=EVENT. Any text that has a hash only, and no "=" sign, will be treated as a value.

So in the above log entry, I want a list that looks like this

#msgtype=EVENT
#server=Web/Dev@server1web
#func=LKZ_WriteData ( line 2992 ) 
#rc=0
#msgid=XYZ0064 
#reqid=0
#msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0) (Notice the hash present in the middle of the text)

I have tried the Python regex findall option, but I am not able to capture all data.

For example:

str='Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)'

z = re.findall("(#.+?=.+?)(:?#|$)",str)
print(z)

Output:

[('#msgtype=EVENT ', '#'), ('#func=LKZ_WriteData ( line 2992 ) ', '#'), ('#msgid=XYZ0064 ', '#'), ('#msg=Web Activity end (section 200, ', '#')]
1
  • 2
    re.findall(r'#[^\s=]+=.*?(?=\s*#[^\s=]+=|$)', text), see demo Commented Oct 25, 2019 at 14:14

2 Answers 2

1

The (:?#|$) is a capturing group that matches an optional : and then #, or end of string. Since re.findall returns all captured substrings the result is a list of tuples.

You need

re.findall(r'#[^\s=]+=.*?(?=\s*#[^\s=]+=|$)', text)

See the regex demo

Regex details

  • #[^\s=]+ - # and then any 1+ chars other than whitespace and =
  • = - a = char
  • .*? - any 0+ chars other than line break chars, as few as possible
  • (?=\s*#[^\s=]+=|$) - up to (and excluding) 0+ whitespaces, #, 1+ chars other than whitespace and = and then = or up the end of string.
Sign up to request clarification or add additional context in comments.

Comments

0
import re

s = "Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)"

a = re.findall('#(?=[a-zA-Z]+=).+?=.*?(?= #[a-zA-Z]+=|$)', s)

result = [item.split('=') for item in a]

print(result)

Gives:

[['#msgtype', 'EVENT'], ['#server', 'Web/Dev@server1web'], ['#func', 'LKZ_WriteData ( line 2992 )'], ['#rc', '0'], ['#msgid', 'XYZ0064'], ['#reqid', '0'], ['#msg', 'Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.