Consider the following (highly simplified) string:
'a b a b c a b c a b c'
This is a repeating pattern of 'a b c' except at the beginning where the 'c' is missing.
I seek a regular expression which can give me the following matches by the use of re.findall():
[('a', 'b'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c')]
The string above thus have 4 matches of 'a b c' - although with the first match as a special case since the 'c' is missing.
My simplest attempt is where I try to capture 'a' and 'b' and use an optional capture for 'c':
re.findall(r'(a).*?(b).*?(c)?', 'a b a b c a b c a b c')
I get:
[('a', 'b', ''), ('a', 'b', ''), ('a', 'b', ''), ('a', 'b', '')]
Clearly, it has just ignored the c. When using non-optional capture for 'c' the search skips ahead prematurely and misses 'a' and 'b' in the second 'a b c'-substring. This results in 3 wrong matches:
[('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c')]
I have tried several other techniques (for instance, '(?<=c)') to no avail.
Note: The string above is just a skeleton example of my "real-world" problem where the three letters above are themselves strings (from a long log-file) in between other strings and newlines from which I need to extract named groups.
I use Python 3.5.2 on Windows 7.
re.findalldoes its job..*?wildcard in between a, b, and c. For starters, try using.+?instead so that the lazy operator doesn't cause it to match zero characters and start the pattern over again.^ab|abcExample:x = "ababcabcabc"stringr::str_extract_all(x,"^ab|abc")[1] "ab" "abc" "abc" "abc"Not sure how that is implemented in python.