Python regex matching when it should not

Question

I have a list of strings and I want to print out the ones that don't match the regex but I'm having some trouble. The regex seems to match strings that it should not, if there is a substring that starts at the beginning of the string that matches the regex. I'm not sure how to fix this.

Example

>>> import re
>>> pattern = re.compile(r'\d+')
>>> string = u"1+*"
>>> bool(pattern.match(string))
True

I get true because of the 1 at the start. How should I change my regex to account for this?

Note I'm on python 2.6.6

Your regex matches numbers, and the string contains a number. You already received some answers based on the hypothesis that you don't want strings which contain anything else, but whether this is really the case is not clear from your question. Perhaps you should edit it to clarify what you want. — tripleee
– tripleee, Commented Jan 15, 2018 at 4:30

Josh Withee · Accepted Answer · 2018-01-15 03:50:53Z

2

Have your regex start with \A and end with \Z. This will make sure that the match begins at the start of the input string, and also make sure that the match ends at the end of the input string.

So for the example you gave, it would look like:

pattern = re.compile(r'\A\d+\Z')

edited Jan 15, 2018 at 3:50

answered Jan 15, 2018 at 3:45

Josh Withee

11.6k4 gold badges53 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alex Huszagh · Accepted Answer · 2018-01-15 03:53:44Z

1

You should append \Z to the end of the regex, so the regex pattern is '\d+\Z'.

Your code then becomes:

>>> import re
>>> pattern = re.compile(r'\d+\Z')
>>> string = u"1+*"
>>> bool(pattern.match(string))
False

This works because \Z forces matching at only the end of the string. You may also use $, which forces a match at a newline before the end of the string or at the end of the string. If you would like to force the string to only contain numeric values (irrelevant if using re.match, but maybe useful if using other regular expression libraries), you may add a ^ to the front of the pattern, forcing a match at the start of the string. The pattern would then be '^\d+\Z'.

edited Jan 15, 2018 at 3:53

answered Jan 15, 2018 at 3:50

Alex Huszagh

14.8k3 gold badges42 silver badges70 bronze badges

1 Comment

Alex Huszagh Over a year ago

I know. If they change the code to re.search though, it's not (but then re.search is pointless). Just added mostly for completeness, because most regular expression libraries work differently (like my favorite library, re2).

Collectives™ on Stack Overflow

Python regex matching when it should not

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related