8

Is there any easy way to test whether a regex matches an entire string in Python? I thought that putting $ at the end would do this, but it turns out that $ doesn't work in the case of trailing newlines.

For example, the following returns a match, even though that's not what I want.

re.match(r'\w+$', 'foo\n')
1
  • 2
    The defacto standard is \A<your regex>\z Supercedes all modes, etc.. Commented Dec 20, 2015 at 21:14

4 Answers 4

9

You can use \Z:

\Z

Matches only at the end of the string.

In [5]: re.match(r'\w+\Z', 'foo\n')

In [6]: re.match(r'\w+\Z', 'foo')
Out[6]: <_sre.SRE_Match object; span=(0, 3), match='foo'>
Sign up to request clarification or add additional context in comments.

7 Comments

Arguably a better answer than mine, since this is what \Z is for.
@BhargavRao, there is no direct link to the \Z so I just stole the line from the docs, what's in my answer is pretty much it.
Argh, you don't need direct link. It comes under the expression syntax. So you can add the link and mention that it is at the end. :P
@BhargavRao, I think that may well be the least stimulating piece of documentation in the world!
|
5

To test whether you matched the entire string, just check if the matched string is as long as the entire string:

m = re.match(r".*", mystring)
start, stop = m.span()
if stop-start == len(mystring):
    print("The entire string matched")

Note: This is independent of the question (which you didn't ask) of how to match a trailing newline.

1 Comment

I think there is an error in the code. I guess len(mystring) is enough. Subtracting -1 seems not to be correct
3

You can use a negative lookahead assertion to require that the $ is not followed by a trailing newline:

>>> re.match(r'\w+$(?!\n)', 'foo\n')
>>> re.match(r'\w+$(?!\n)', 'foo')
<_sre.SRE_Match object; span=(0, 3), match='foo'>

re.MULTILINE is not relevant here; OP has it turned off and the regex is still matching. The problem is that $ always matches right before the trailing newline:

When [re.MULTILINE is] specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

I have experimentally verified that this works correctly with re.X enabled.

1 Comment

Yup, deleted my answer. Misunderstood the OP's goal.
2

Based on @alexis answer: A method to check for a fullMatch could look like this:

def fullMatch(matchObject, fullString):
    if matchObject is None:
        return False
    start, stop = matchObject.span()
    return stop-start == len(fullString):

Where the fullString is the String on which you apply the regex and the matchObject is the result of matchObject = re.match(yourRegex, fullString)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.