Regex Multiline Python

Question

i am currently trying to do a regex on python that should match multiline.

([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4

is my regular expression, this is my python call

re.findall("([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4.*?", content, flags=re.M)

However when I use the Regex for example in Notepad++ it provides me with the proper matches whereas in python it does not match anything at all (here is an example string that is matched in npp but not in python)

19.04.2016 01:59:18 ASDF

---- FG 3

 --------------- ASDF

19.04.2016 01:59:21 ASDF

---- FG 4

 --------------- ASDF

19.04.2016 01:59:22 ASDF

---- FG 4

 --------------- ASDF

I am also sure that there in fact is a \r\n since npp provides me with matches.

Since I am using the multiline flag I have absolutely no idea why my regex won't work.

Please provide more examples of your string content, because like this—with a single line—it’s not really helpful. And it also doesn’t match. — poke
– poke, Commented May 30, 2016 at 12:47

Dilettant · Accepted Answer · 2016-05-30 13:07:14Z

Note that in the corrected input shown, the part FG\s{3,}?4of the pattern avoids a match as a single space does not match between FGand the 4.

#! /usr/bin/env python
from __future__ import print_function
import re    

content = "19.04.2016 05:31:03 ASDFASDF\r\n---- FG 4 "
pattern = (r'([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?'
           r'\r\n-{1,}\sFG\s{1,}?4.*?')
print(re.findall(pattern, content, flags=re.M))

gives me (unmodified with python 2.7.11 and 3.5.1):

['19.04.2016 05:31:03']

Edit: Here a version for the updated amended input samples as transcribed by @poke:

#! /usr/bin/env python
from __future__ import print_function
import re

content = ("19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4"
           "\r\n19.04.2016 05:31:03  ASDFASDF\r\n---- FG   4")
pattern = (r'([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?'
           r'\r\n-{1,}\sFG\s{1,}?4.*?')
print(re.findall(pattern, content, flags=re.M))

Gives (as to be expected):

['19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03']

Alex Garcia · Accepted Answer · 2016-05-30 12:50:45Z

0

If your input contains a line break with \r\n and you correct the spacing after the 'FG' part, this should work:

([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s+?4

Tested here (with \n only for the line break): https://regex101.com/r/iT1rF2/2

answered May 30, 2016 at 12:50

Alex Garcia

8339 silver badges23 bronze badges

1 Comment

Alex Garcia Over a year ago

@poke The site I linked is testing a python expression

poke · Accepted Answer · 2016-05-30 12:54:02Z

Works for me:

>>> content = '''19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4\r
19.04.2016 05:31:03  ASDFASDF\r
---- FG   4'''
>>> re.findall("([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4.*?", content, flags=re.M)
['19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03', '19.04.2016 05:31:03']

Note that I had to add explicit \r at the end of each line. You say that your text contains actual \r\n but please make sure that’s the case.

If you are reading the content from a file, note that Python performs a newline normalization when you open a file. So you likely end up with only \n although the file originally contained \r\n.

mquantin · Accepted Answer · 2016-05-30 13:47:33Z

0

Note that you could re-write your regex with shortcuts so that your pattern:

([0-9]{2}\.[0-9]{2}\.[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}).*?\r\n-{1,}\sFG\s{3,}?4

becomes (shortcut and corrections):

(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}:\d{2}).*?\r\n-+\sFG\s+?4

answered May 30, 2016 at 13:47

mquantin

1,1581 gold badge8 silver badges23 bronze badges

1 Comment

Dilettant Over a year ago

Good point! Readability counts esp. in embedded "languages" of the more cryptic kind ... and there is still a wild jungle to discover in "regex land". Example: Matching a time stamp and not accepting too many invalid entries is a lengthy construct in itself. In our question's pattern months like 99 and hours like 42 etc. are being accepted.

Collectives™ on Stack Overflow

Regex Multiline Python

4 Answers 4

Comments

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related