1

i am currently trying to do a regex on python that should match multiline.

([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4

is my regular expression, this is my python call

re.findall("([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4.*?", content, flags=re.M)

However when I use the Regex for example in Notepad++ it provides me with the proper matches whereas in python it does not match anything at all (here is an example string that is matched in npp but not in python)

19.04.2016 01:59:18 ASDF

---- FG 3

 --------------- ASDF

19.04.2016 01:59:21 ASDF

---- FG 4

 --------------- ASDF

19.04.2016 01:59:22 ASDF

---- FG 4

 --------------- ASDF

I am also sure that there in fact is a \r\n since npp provides me with matches.

Since I am using the multiline flag I have absolutely no idea why my regex won't work.

2
  • Could you provide a clearer example of the input ? Commented May 30, 2016 at 12:47
  • Please provide more examples of your string content, because like this—with a single line—it’s not really helpful. And it also doesn’t match. Commented May 30, 2016 at 12:47

4 Answers 4

2

Note that in the corrected input shown, the part FG\s{3,}?4of the pattern avoids a match as a single space does not match between FGand the 4.

#! /usr/bin/env python
from __future__ import print_function
import re    

content = "19.04.2016 05:31:03 ASDFASDF\r\n---- FG 4 "
pattern = (r'([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?'
           r'\r\n-{1,}\sFG\s{1,}?4.*?')
print(re.findall(pattern, content, flags=re.M))

gives me (unmodified with python 2.7.11 and 3.5.1):

['19.04.2016 05:31:03']

Edit: Here a version for the updated amended input samples as transcribed by @poke:

#! /usr/bin/env python
from __future__ import print_function
import re

content = ("19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4")
pattern = (r'([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?'
           r'\r\n-{1,}\sFG\s{1,}?4.*?')
print(re.findall(pattern, content, flags=re.M))

Gives (as to be expected):

['19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03']
Sign up to request clarification or add additional context in comments.

Comments

0

If your input contains a line break with \r\n and you correct the spacing after the 'FG' part, this should work:

([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s+?4

Tested here (with \n only for the line break): https://regex101.com/r/iT1rF2/2

1 Comment

@poke The site I linked is testing a python expression
0

Works for me:

>>> content = '''19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4'''
>>> re.findall("([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4.*?", content, flags=re.M)
['19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03']

Note that I had to add explicit \r at the end of each line. You say that your text contains actual \r\n but please make sure that’s the case.

If you are reading the content from a file, note that Python performs a newline normalization when you open a file. So you likely end up with only \n although the file originally contained \r\n.

Comments

0

Note that you could re-write your regex with shortcuts so that your pattern:

([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4

becomes (shortcut and corrections):

(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}:\d{2}).*?\r\n-+\sFG\s+?4

1 Comment

Good point! Readability counts esp. in embedded "languages" of the more cryptic kind ... and there is still a wild jungle to discover in "regex land". Example: Matching a time stamp and not accepting too many invalid entries is a lengthy construct in itself. In our question's pattern months like 99 and hours like 42 etc. are being accepted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.