3

I have a mail log file, which is like this:

Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff

What I want is a list of all mail hosts in lines that contain "sm-mta". In this case that would be: ['gmail.com', 'yahoo.com', 'aol.com', 'gmail.com', gmail.com']

re.findall(r'sm-mta.*to=.+?@(.*?)[>, ]') will return only first host of each matching line (['gmail.com','gmail.com'])

re.findall(r'.+?@(.*?)[>, ]') will return the correct list, but I need filtering too. Is there any workaround on this?

1

2 Answers 2

3

If you cannot use PyPi regex library, you will have to do that in two steps: 1) grab the lines with sm-mta and 2) grab the values you need, with something like

import re

txt="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff"""
rx = r'@([^\s>,]+)'
filtered_lines = [x for x in txt.split('\n') if 'sm-mta' in x]
print(re.findall(rx, " ".join(filtered_lines)))

See the Python demo online. The @([^\s>,]+) pattern will match @ and will capture and return any 1+ chars other than whitespace, > and ,.

If you can use PyPi regex library, you may get the list of the strings you need with

>>> import regex
>>> x="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff"""
>>> rx = r'(?:^(?=.*sm-mta)|\G(?!^)).*?@\K[^\s>,]+'
>>> print(regex.findall(rx, x, regex.M))
['gmail.com', 'yahoo.com', 'aol.com,', 'gmail.com', 'gmail.com']

See the Python online demo and a regex demo.

Pattern details

  • (?:^(?=.*sm-mta)|\G(?!^)) - a line that has sm-mta substring after any 0+ chars other than line break chars, or the place where the previous match ended
  • .*?@ - any 0+ chars other than line break chars, as few as possible, up to the @ and a @ itself
  • \K - a match reset operator that discards all the text matched so far in the current iteration
  • [^\s>,]+ - 1 or more chars other than whitespace, , and >
Sign up to request clarification or add additional context in comments.

Comments

1

Try regex module.

x="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff"""
import regex
print regex.findall(r"sm-mta.*to=\K|\G(?!^).+?@(.*?)[>, ]", x, version=regex.V1)

Output: ['', 'gmail.com', 'yahoo.com', 'aol.com', '', 'gmail.com', 'gmail.com']

Just ignore the first empty match.

https://regex101.com/r/7zPc6j/1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.