0

I'm trying to build a search query which searches for a word in a string and finds matches based on the following criteria:

  • The word is surrounded by a combination of either a space, period or comma
  • The word is at the start of the string and ends with a space, period or comma
  • The word is at the end of the string and is followed by a space, period or comma
  • It's a full match, i.e. the entire string is just the word

For example, if the word is 'php' the following strings would be matches:

  • php
  • mysql, php, javascript
  • php.mysql
  • javascript php

But for instance it wouldn't match:

  • php5

I've tried the following query:

SELECT * FROM candidate WHERE skillset REGEXP '^|[., ]php[., ]|$'

However that doesn't work, it returns every record as a match which is wrong.

Without the ^| and |$ in there, i.e.

SELECT * FROM candidate WHERE skillset REGEXP '[., ]php[., ]'

It successfully finds matches where 'php' is somewhere in the string except the start and end of the string. So the problem must be with the ^| and |$ part of the regexp.

How can I add those conditions in to make it work as required?

2 Answers 2

1

Try '\bphp\b', \b is a word boundary and might just be exactly what you need because it looks for the whole word php.

For MySQL, word boundaries are represented with [[:<:]] and [[:>:]] instead of \b, so use the query '[[:<:]]php[[:>:]]'. More info on word boundaries here.

Sign up to request clarification or add additional context in comments.

3 Comments

Unfortunately, SELECT * FROM candidate WHERE skillset REGEXP '\bphp\b' returns 0 results but should be several hundred.
@dlofrodloh hmm, looks like regex for mysql is slightly different. Try '[[:<:]]php[[:>:]]'?
@Shirley please edit your answer with the fix. Don't leave the solution in the comment please
1

Well, you can play around a bit with regex101.com

Something I found that works for you but doesn't exactly follow your rules is:

/(?=[" ".,]?php[" ".,]?)(?=php[\W])/

This uses the lookahead operator, ?=, to do AND The first portion of the regex is

[" ".,]?php[" ".,]?

This will match anything that has a space, period, or comma before or after the php, but at most only one.

The section portion of the regex is

php[\W]

This will match anything that is php, followed by a non-character. In other words, it will NOT match php followed by a character, digit, or underscore.

It's not the perfect answer for your set of rules, but it does work with your sample data set. Play around on regex101.com and try to make a perfect one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.