2

I am trying to clean up some log and want to extract general information from the message. I am newie to python and just learn regular expression yesterday and now have problems.

My message look like this:

 Report ZSIM_RANDOM_DURATION_ started
 Report ZSIM_SYSTEM_ACTIVITY started
 Report /BDL/TASK_SCHEDULER started
 Report ZSIM_JOB_CREATE started
 Report RSBTCRTE started
 Report SAPMSSY started
 Report RSRZLLG_ACTUAL started
 Report RSRZLLG started
 Report RGWMON_SEND_NILIST started

I try to some code:

clean_special2=re.sub(r'^[Report] [^1-9] [started]','',text)

but I think this code will remove all rows however I want to keep the format like Report .....Started. So I only want to remove the jobs name in the middle.

I expect my outcome looks like this:

Report started

Anyone can help me with a idea? Thank you very much!

2
  • I'm not sure i understand. The way you describe this, couldn't you just write a new file with the same number of lines, each containing Report started? You haven't made any case why you need regex. Commented Nov 10, 2016 at 18:24
  • Sorry, I think I didn't make it clear. There are also some other message like"Logon successful ","RFC/CPIC logon successful " so I only show the message I want to clean up. Commented Nov 10, 2016 at 18:27

3 Answers 3

3

Try something like this:

clean_special2=re.sub(r'(?<=^Report\b).*(?=\bstarted)',' ',text)

Explanation: the (?<=...) is a positive lookbehind, e.g. the string must match the content of this group, but it will not be captured and thus not replaced. Same thing on the other side with a positive look-ahead (?=...). The \b is a word boundary, so that everything between these words will be matched. Since this will also trim away the whitespace, the replacement is a single whitespace.

Sign up to request clarification or add additional context in comments.

4 Comments

See the comment by the OP, i think this is not general enough.
@sobek I read the comment, but I think it is general enough: it will only remove the report jobs but not touch any other message when applied to each message.
Thank you very much for your help and explanation! It is work for my case. But I think I still have some confuse:1. like what ^ use for (?<=^Report\b) ? Does it mean to indicate a initial position for the pattern matching? Thank you!
@zihanmeng Yes exactly, the ^ requires the match to be at the beginning of a line. So if you had a string like External Report SOMETHING started it would not match.
1

I don't know about the python syntax but I can sure this regexp can help you match your string

/^Report\W+([\w&.#@%^!~-]+)\W+started/m*

The python string might be like this
text = "Report ZSIM_RANDOM_DURATION_ started";

clean_special2=re.sub(r'^Report\W+([\w&.#@%^!~-]+)\W+started',' ',text)*

Comments

1

This should do... '^Report\ [^\ ]*\ started'

Regex is black magic, only use it when you have to. Online tools make it much easier to write: https://regex101.com/

4 Comments

Thank you for your help and let me know this site!
@zihanmeng Re-reading this I'm still not sure that I understand your question correctly. Where do the "other messages" you mention occur? On their own lines? or somewhere within the same string? An example that includes these "other messages" which you need to preserve would be useful.
@technicalbloke Regex is not black magic at all... it is a powerful tool which one should use where appropriate, that is, for most non-trivial pattern-matching tasks on text.
@Lucero, so it's a powerful tools you should use with caution, a bit like black magic then? ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.