-1

While writing a program to detect repeating patterns in binary I came across a weird instance where a regex does not seem to properly match in python.

The regex is ran as followed:

pattern = re.compile("^0b(1*)(0*)(\1\2)*(\1)?$")
result = pattern.match("0b101")

What I would expect to see is the following matching groups:

  • 1: '1'
  • 2: '0'
  • 3: empty
  • 4: '1'

But instead I get no match at all. According to the website regex101 the match should be as expected, but python seems to disagree.

Is there a difference between interpreters in python and the website or just some small mistake I'm missing?

4
  • Is this what you are after? stackoverflow.com/questions/5618988/… Commented Dec 3, 2019 at 10:08
  • 3
    First off you're not escaping your backslashes... you might want to try with a raw-string, eg: r"^0b(1*)(0*)(\1\2)*(\1)?$" - which will then match your entire string, but then you still need to group accordingly Commented Dec 3, 2019 at 10:08
  • 1
    The given input doesn't have a 3rd group, but has first, second and fourth group, because \1\2 doesn't match, and the final \1 does match (the 4th group). Commented Dec 3, 2019 at 10:09
  • @JonClements oh man, you're absolutely correct! Seems like that fixed it. Don't know how I missed it haha. Commented Dec 3, 2019 at 10:13

2 Answers 2

2

and the website

I'm assuming you created your regex using one of the websites like regex101, right?

If you look closely, regex101, it hints it uses raw strings.

In your case:

pattern = re.compile("^0b(1*)(0*)(\1\2)*(\1)?$")

Python tries to interpret \1 as normal escape sequences - like \n etc.

What you need, is \ that after string parsing, regex parser can parse.

This means, escaping the backslash - \\ or using a raw string, so that Python knows it shouldn't parse any \ns and similar ones.

pattern = re.compile(r"^0b(1*)(0*)(\1\2)*(\1)?$")
Sign up to request clarification or add additional context in comments.

Comments

0

The regex ^0b(1*)(0*)(\1\2)*(\1)?$, applied on 0b101, matches the following groups (matches are bolded):

  • group 1 - 0b101
  • group 2 - 0b101
  • group 3 - no match, since "10" wasn't encountered
  • group 4 - 0b101 (successfully matches \1, which is a "1")

>>> pattern = re.compile(r"^0b(1*)(0*)(\1\2)*(\1)?$")
>>> result = pattern.match("0b101")
>>> result.groups()
('1', '0', None, '1')

6 Comments

That's not the problem. In current way the code is written .match returns None. You haven't even executed the posted code.
Since group 3 is ended with a * symbol it should still result in a match however
@h4z3 "What I would expect to see is the following matching groups: 1: '1' 2: '0' 3: '1'" - I was referencing that comment, referring to the "3" group.
@martijnp No, it'll not be matched, it'll be discarded. My answer is purely about the regex, not Python, and it was referring your expectation to have the third group matched, which is not correct because only groups 1, 2 and 4 are matched.
It matches, it just places what I put as group 3 in group 4. I forgot to add that group 3 is empty and just left it out since it doesn't contain anything
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.