11

I'd like to match strings like:

45 meters?
45, meters?
45?
45 ?

but not strings like:

45 meters you?
45 you  ?
45, and you?

In both cases the question mark must be at the end. So, essentially I want to exclude all those strings containing the word "you".

I've tried the following regex:

'\d+.*(?!you)\?$'

but it matches the second case (probably because of .*)

6
  • Can you occur anywhere in the string, or only at the end? Commented Jun 19, 2014 at 16:36
  • It could be anywhere, but for now I'm interested when it occurs at the end, thank you. Commented Jun 19, 2014 at 16:37
  • I meant "anywhere" in the sense that between you and ? there could be some white spaces. Commented Jun 19, 2014 at 16:41
  • 1
    You could try \d+.*?(?<!you\?)$, but it will also match 45 you ?. You can't do (?<!you\s*\?) because lookbehinds (in most flavors) need to be fixed-length. Commented Jun 19, 2014 at 16:42
  • Is regex mandatory? you can make it like: "you" in mystring Commented Jun 19, 2014 at 16:42

2 Answers 2

15

You could try this regex to match all the lines which doesn't have the string you with ? at the last,

^(?!.*you).*\?$

Explanation:

A negative lookahead is used in this regex. What it does actually means, it checks for the lines which contains a string you. It matches all the lines except the line containing the string you.

DEMO

Sign up to request clarification or add additional context in comments.

6 Comments

The OP only wants it to discard the match if the word is you.
Sorry, but I need ., because it should even match 45, meters?
did you want this ^[\d,]+ ?(\w+)?\?$?
Better, but again, as @AmalMurali said, it fails 42 test meters?
Yes, I think the second one is the best one, but I'll change it in ^(?!.*you).*\?$' because as I wrote, I also need the question mark at the end of the string. Thank you!!!
|
15

There's a neat trick to exclude some matches from a regex, which you can use here:

>>> import re
>>> corpus = """
... 45 meters?
... 45?
... 45 ?
... 45 meters you?
... 45 you  ?
... 45, and you?
... """
>>> pattern = re.compile(r"\d+[^?]*you|(\d+[^?]*\?)")
>>> re.findall(pattern, corpus)
['45 meters?', '45?', '45 ?', '', '', '']

The downside is that you get empty matches when the exclusion kicks in, but those are easily filtered out:

>>> filter(None, re.findall(pattern, corpus))
['45 meters?', '45?', '45 ?']

How it works:

The trick is that we only pay attention to captured groups ... so the left hand side of the alternation - \d+[^?]*you (or "digits followed by non-?-characters followed by 'you'") matches what you don't want, and then we forget about it. Only if the left hand side doesn't match is the right hand side - (\d+[^?]*\?) (or "digits followed by non-?-characters followed by '?') - matched, and that one is captured.

2 Comments

+1. I was writing an answer with the same method, but you were faster.
Thanks! Negative lookahead is perhaps the better answer to the OP question, but this answer is the answer to my particular question. I've had a hard time finding it due to the sheer profusion of questions all answered by negative lookahead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.