Regex does not match expected output in python [duplicate]

Question

While writing a program to detect repeating patterns in binary I came across a weird instance where a regex does not seem to properly match in python.

The regex is ran as followed:

pattern = re.compile("^0b(1*)(0*)(\1\2)*(\1)?$")
result = pattern.match("0b101")

What I would expect to see is the following matching groups:

1: '1'
2: '0'
3: empty
4: '1'

But instead I get no match at all. According to the website regex101 the match should be as expected, but python seems to disagree.

Is there a difference between interpreters in python and the website or just some small mistake I'm missing?

Is this what you are after? stackoverflow.com/questions/5618988/… — panoskarajohn
– panoskarajohn, Commented Dec 3, 2019 at 10:08
First off you're not escaping your backslashes... you might want to try with a raw-string, eg: r"^0b(1*)(0*)(\1\2)*(\1)?$" - which will then match your entire string, but then you still need to group accordingly — Jon Clements
– Jon Clements, Commented Dec 3, 2019 at 10:08
The given input doesn't have a 3rd group, but has first, second and fourth group, because \1\2 doesn't match, and the final \1 does match (the 4th group). — Maroun
– Maroun, Commented Dec 3, 2019 at 10:09
@JonClements oh man, you're absolutely correct! Seems like that fixed it. Don't know how I missed it haha. — martijn p
– martijn p, Commented Dec 3, 2019 at 10:13

h4z3 · Accepted Answer · 2019-12-03 10:11:51Z

2

and the website

I'm assuming you created your regex using one of the websites like regex101, right?

If you look closely, regex101, it hints it uses raw strings.

In your case:

pattern = re.compile("^0b(1*)(0*)(\1\2)*(\1)?$")

Python tries to interpret \1 as normal escape sequences - like \n etc.

What you need, is \ that after string parsing, regex parser can parse.

This means, escaping the backslash - \\ or using a raw string, so that Python knows it shouldn't parse any \ns and similar ones.

pattern = re.compile(r"^0b(1*)(0*)(\1\2)*(\1)?$")

answered Dec 3, 2019 at 10:11

h4z3

5,4951 gold badge18 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Maroun · Accepted Answer · 2019-12-03 10:19:54Z

0

The regex ^0b(1*)(0*)(\1\2)*(\1)?$, applied on 0b101, matches the following groups (matches are bolded):

group 1 - 0b101
group 2 - 0b101
group 3 - no match, since "10" wasn't encountered
group 4 - 0b101 (successfully matches \1, which is a "1")

>>> pattern = re.compile(r"^0b(1*)(0*)(\1\2)*(\1)?$")
>>> result = pattern.match("0b101")
>>> result.groups()
('1', '0', None, '1')

edited Dec 3, 2019 at 10:19

answered Dec 3, 2019 at 10:12

Maroun

96.3k30 gold badges195 silver badges249 bronze badges

6 Comments

h4z3 Over a year ago

That's not the problem. In current way the code is written .match returns None. You haven't even executed the posted code.

martijn p Over a year ago

Since group 3 is ended with a * symbol it should still result in a match however

Maroun Over a year ago

@h4z3 "What I would expect to see is the following matching groups: 1: '1' 2: '0' 3: '1'" - I was referencing that comment, referring to the "3" group.

Maroun Over a year ago

@martijnp No, it'll not be matched, it'll be discarded. My answer is purely about the regex, not Python, and it was referring your expectation to have the third group matched, which is not correct because only groups 1, 2 and 4 are matched.

martijn p Over a year ago

It matches, it just places what I put as group 3 in group 4. I forgot to add that group 3 is empty and just left it out since it doesn't contain anything

|

Collectives™ on Stack Overflow

Regex does not match expected output in python [duplicate]

2 Answers 2

Comments

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Linked

Related