0

Why does my regex numbers match a string like $ab? I want it to only match sequences of decimal digits, 0-9 either followed by a ${ or a $ and followed by a } in the first case and nothing in the second case.

import re

numbers = re.compile('\$\{[0-9]*\}|\$[0-9]*')       # ${ANY_SEQUENCE_OF_DIGITS} or $ANY_SEQUENCE_OF_DIGITS
if numbers.match("$ab"):
    print 'matches'

This sample code prints 'matches'

2
  • It isn't directly related here, but you really, really shouldn't use non-raw string literals and unescaped backslashes in regular expressions. Do you have the complete list of Python backslash escapes memorized? Are you 100% sure? Are you 100% sure that anyone who reads your code will? Commented Oct 10, 2014 at 22:09
  • Also, you should really pick a regular expression debugger. There are about 30 available for each platform and 69105 as websites to choose from, all with different strengths and weaknesses as far as what they show you, but all of them are better than trying to reason it out in your head. (Just make sure you pick one that knows Python syntax!) I've been using Debuggex recently, but don't take that as a specific endorsement. Commented Oct 10, 2014 at 22:11

3 Answers 3

5

Keep in mind that * means zero or more. It matches because youf have a $ and zero digits after it. match() does not require the entire string to be matched, just the beginning.

If you want to match non-empty digits and nothing extra after it:

numbers = re.compile(r'\$\{[0-9]+\}$|\$[0-9]+$')

This uses + to require "1 or more" digits, as well as explicit $ to indicate that there can't be extra stuff at the end (you can leave those off if you do want to allow extra characters on the end)

Sign up to request clarification or add additional context in comments.

Comments

3

You're matching $ because you have [0-9]*

What you probably want is this

re.compile('\$\{[0-9]+\}|\$[0-9]+')

2 Comments

Don't I need to escape the brackets { and } because I want to match them? (i.e. if the brackets are not there, it is not a match)
Yes, my escapes got eaten by the markdown editor I think
0

If you just had:

import re

numbers = re.compile('\$\{[0-9]*\}')       # ${ANY_SEQUENCE_OF_DIGITS} or $ANY_SEQUENCE_OF_DIGITS
if numbers.match("$ab"):
    print 'matches'

Then it would not print "matches"

However, since you added |\$[0-9]* you are essentially saying "or match $ followed by zero or more digits." $ab indeed satisfies "$ followed by zero or more digits" and so a match is found.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.