1

Firstly, my syntax will not be part of a script as such but it will be parsed via a form input--so any 'existing' solution pointing to Java code will not apply per se.

Okay, so here is what I need to do: I need to be able to input a term like:

'This is your airport and this is your car.' into an input field in such a way that only the word 'airport' or 'airports' to be matched. So nothing like '99airport' or 'airport99' should be matched. And I am close!

(?i).*\bair[port|ports].*

If I input the above as RegEx in a test site:

http://www.ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/#!;t=123-45-6789%0A9876-5-4321%0A987-65-4321%20(attack)%0A987-65-4321%20%0A192-83-7465&r=(%3Fm)%5E(%5Cd%7B3%7D-%3F%5Cd%7B2%7D-%3F%5Cd%7B4%7D)%24&x=Found%20good%20SSN%3A%20%241

then, indeed, '99airport' does not match because of the beginning use of the Word Boundary identifier \b ; However, I don't know how to put the \b around the ending of the word so that 'airport99' also does not match. I have tried a few things but no luck. I think it is the syntax to be put around the [] which needs to be figured out.

And please don't pay too much attention to what needs to be matched or not--these are just random words. Currently, if my input has 'airport99' it does get matched but it shouldn't if I can figure out a solution.

Thanks!

5
  • Just use (?i).*\bairports?\b.*. In Java, String patrn = "(?i).*\\bairports?\\b.*"; Commented Nov 16, 2015 at 14:31
  • Thanks. That would work except I also need the possible combinations inside the [ ]; you see, in real life, I am trying to catch words like 'tornado' or 'Tornado' or 'ToRNADOES' while preventing to catch 'FCGTornadoTeam' . Commented Nov 16, 2015 at 14:33
  • What about country/countries? Commented Nov 16, 2015 at 14:34
  • Yes, that could work as countr with [y|ies] Commented Nov 16, 2015 at 14:35
  • 1
    You are mistaking groups for character classes. Let me explain in an answer. Commented Nov 16, 2015 at 14:36

2 Answers 2

2

I see you are using mather.matches to check for a word inside the input string. That is why you need the .* before and after a keyword. Since the text is coming from an input field, you do not need to match newline symbols, and no need in (?s) singleline/dotall modifier.

However, you mistake character classes ([...]) with groups ((...)). Character classes match 1 character. For example, [port|ports] matches 1 character, either p, o, r, t, |, or s. Groups can be used to match specific sequences of symbols. E.g. (port|ports) will match either port or ports.

Thus, in your case, you can use

(?i).*\bairports?\b.*

or - less effecient -

(?i).*\bair(port|ports)\b.*

In Java, String patrn = "(?i).*\\bairports?\\b.*";

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, (?i).*\bair(port|ports)\b.* does it! My problem was using [ ] instead of () to catch port|ports. I will see if this will work inside the actual program or not--it should. Thanks for your help and the explanation. New to RegEx and not really a Java person.
I really would like to stress the fact that just using ports? (where s? matches 1 or 0 s) is a better alternative to (port|ports). For country/countries, you can't escape using an alternative groups, but wherever possible, just use optional quantifier ? (1 or 0 occurrences). It will involve much less backtracking, and improve performance. Well, if your input strings are short, that should not be a problem. Just best practices are best practices :)
1

This expression should match your requirements:

(?i)\\b(air)?port\\b

It does match "port" and "airport" but does not match "99port" nor "port99" nor "99airport" neither "airport99".

If a more generic expression is needed, this one should match any word starting with "air" plus some other (optional) letters, but no digits or punctuation symbols:

(?i)\\b(air)?[a-z]*\\b

4 Comments

I admit maybe I didn't phrase the question 100% correctly--if I could just prevent both '99airport' and 'airport99' to match while still retaining the possible words inside the [ ] then that would be all I need. I still need to account for what is inside [ ] and not sure your solution accounts for that. Apologies.
Hehe. Yes, it happens :-) I suggest you include in your question several examples of what should be matched and other examples of what should not be matched.
Thanks for understanding. Rephrased the question.
Thanks! I think stribizhev's Answer perfectly matches the requirements; I was close but was using [ ] instead of () to catch the port or ports possibilities.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.