6

I am attempting to using part of a regular expression as input for a later part of the regular expression.

What I have so far (which fails the assertions):

import re
regex = re.compile(r"(?P<length>\d+)(\d){(?P=length)}")
assert bool(regex.match("3123")) is True
assert bool(regex.match("100123456789")) is True

Breaking this down, the first digit(s) signify how many digits should be matched afterward. In the first assertion, I get a 3, as the first character, which means there should be exactly three digits after, otherwise there are more than 9 digits. If there are more than 9 digits, then the first group will need to be expanded and checked against the rest of the digits.

The regular expression 3(\d){3} would properly match the first assertion, however I cannot get the regular expression to match the general case where the braces {} are fed a regular expression backreference: {(?P=length)}

Calling the regular expression with the re.DEBUG flag I get:

subpattern 1
  max_repeat 1 4294967295
    in
      category category_digit
subpattern 2
  in
    category category_digit
literal 123
groupref 1
literal 125

It looks like the braces { (123) and } (125) are being interpreted as literals when there is a backreference inside of them.
When there is no backreference, such as {3}, I can see that {3} is being interpreted as max_repeat 3 3

Is using a backreference as part of a regular expression possible?

2
  • 5
    Unfortunately you can't put a backreference inside a quantifier. That's the life. Commented Dec 10, 2016 at 22:12
  • 2
    You can only use backreferences in actual match text, not inside things like repeat quantifiers or character classes. Commented Dec 10, 2016 at 22:12

1 Answer 1

3

There is no way to put the backreference as a limiting quantifier argument inside the pattern. To solve your current task, I can suggest the following code (see inline comments explaining the logic):

import re
def checkNum(s):
    first = ''
    if s == '0':
        return True # Edge case, 0 is valid input
    m = re.match(r'\d{2,}$', s) # The string must be all digits, at least 2
    if m:
        lim = ''
        for i, n in enumerate(s):
            lim = lim + n
            if re.match(r'{0}\d{{{0}}}$'.format(lim), s):
                return True
            elif int(s[0:i+1]) > len(s[i+1:]):
                return False

print(checkNum('3123'))         # Meets the pattern (123 is 3 digit chunk after 3)
print(checkNum('1234567'))      # Does not meet the pattern, just 7 digits
print(checkNum('100123456789')) # Meets the condition, 10 is followed with 10 digits
print(checkNum('9123456789'))   # Meets the condition, 9 is followed with 9 digits

See the Python demo online.

The pattern used with re.match (that anchors the pattern at the start of the string) is {0}\d{{{0}}}$ that will look like 3\d{3}$ in case the 3123 is passed to the checkNum method. It will match a string that starts with 3, and then will match exactly 3 digits followed with end of string marker ($).

Sign up to request clarification or add additional context in comments.

8 Comments

What about validating groups that are 9+ digits? 1501234567890 should be false.
I thought they should pass. Please make the question a bit clearer in that regard. Just remove the first part in the checkNum method.
I can see how my wording would make it confusing. I was trying to attempt to highlight the context sensitive nature of the string.
Well, the point is that you need to get the necessary data for regex building first, and then build the pattern dynamically.
Ok, I see what you mean. The first n digits may be the limiting quantifier argument.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.