1

I'd like to use MySQL's REGEXP to match multiple numbers of a csv in a MySQL query.

I am trying to identify if a CSV string contains numbers 2 and 9. The order matters in the result. They could be back to back, and be present at the beginning and/or end.

The below CSV strings should all produce positive result:

1,2,3,4,5,6,7,8,9,10
2,9,1,2,3,4,5,10
1,2,3,5,9

These CSV strings should not:

9,2,3,4,5,10 - (2 doesn't exist before 9)
2,1,2,3,4,5,10 - (9 not present)

I've tried to match what I am expecting in the pattern by the following logic:

  1. matching anything or nothing
  2. match the number 2 at least one time
  3. matching anything or nothing
  4. match the number 9 at least one time
  5. matching anything or nothing

My expression that is close, but not working is:

REGEXP '.*([^0-9][2][^0-9])+.*([^0-9][9][^0-9])+.*'

The above expression fails to match if 2 is the very beginning or 9 the very end of the string. Thanks for the input.

6
  • what about "2,9,2" or "9,2,9"? Should these be accepted or discarded? Commented Jan 17, 2016 at 7:14
  • Those should both be positive for a result Commented Jan 17, 2016 at 7:31
  • and how big can the numbers in the string be? can your string be something like "22,9" ? Commented Jan 17, 2016 at 7:42
  • The numbers can be any size typically no more than 4 digits. IE 1,2,45,6788,3 Commented Jan 17, 2016 at 9:08
  • 1
    Thanks strawberry, but you are assuming the data set is not normalized. That csv string IS the reduced and logical data set to work with. A giant infrastructure re-write to create data in a format that fits one preg_match query or something else, versus using mysql REGEXP isn't practical nor would I call it the recommended route. Hence the specificity of my question. Commented Jan 17, 2016 at 23:13

2 Answers 2

1

How about this?

(^|(.*\D))2\D(.*[\D]){0,1}9($|\D.*)

Check out the unit tests in this RegEx-Demo

  • (^|(.*\D)) - beginning of the string or something ending not with 0-9.
  • 2 - we need a 2 first!
  • \D(.*[\D]){0,1} matches ","(needed for 2 and 9 directly after each other like 2,9) or ",...,"
  • 9 - we need a 9 after 2.
  • ($|\D.*) - end of the word or something starting not with 0-9
Sign up to request clarification or add additional context in comments.

1 Comment

Is this MySQL-compatible?
0

MySQL

Since we're using MySQL REGEXP, we can take this approach:

SELECT * FROM table WHERE field REGEXP '[[:<:]]2[[:>:]].*[[:<:]]9[[:>:]]'

Assuming we have only one line of CSV in each row, this will match:

1,2,3,4,5,6,7,8,9,10
2,9,1,2,3,4,5,10
1,2,3,5,9

And not match:

9,2,3,4,5,10
2,1,2,3,4,5,10
20,9,1,2,3,4,5,10
2,19,1,2,3,4,5,10

In MySQL, [[:<:]] and [[:>:]] match the beginning and ending of a "word", and a , is not "part of a word" (but two numbers next to each other are considered a "word" still).

For example:

mysql> SELECT * FROM test WHERE csv REGEXP '[[:<:]]2[[:>:]].*[[:<:]]9[[:>:]]';
+----+----------------------+
| id | csv                  |
+----+----------------------+
|  1 | 1,2,3,4,5,6,7,8,9,10 |
+----+----------------------+
1 row in set (0.00 sec)

PCRE

I had originally thought this was a PCRE question, but it was a MySQL REGEXP question! However, in case someone finds it useful, I'll leave this information about PCRE here.

This regex is basically equivalent to the MySQL REGEXP above:

^.*\b2\b.*\b9\b.*$

Using the link above can help you visualize the match.

\b is a "word boundary" (and means basically the same thing as [[:<:]] or [[:>:]] in MySQL), preventing us from matching digits that are part of other numbers.

Note, if you're trying to match the entire multi-line block of text at once, use the m PCRE modifier (PCRE_MULTILINE flag) so that ^ and $ anchor at the beginning and end of each line, rather than the whole string.

So, in PHP, we'd use:

preg_match('/^.*\b2\b.*\b9\b.*$/', $csvRow);

Or:

preg_match('/^.*\b2\b.*\b9\b.*$/mg', $wholeCsvFile);

11 Comments

I will test this solution. With real data and queries on my data set. And update if this works for me asap. thanks
I have tried the regex... without success. I'm guessing because its its not the MySQL flavor of regex. My MySQL statement looks like: $query="WHERE csvline REGEXP '^.*\b2(?:\b[0-9,]+\b)*9\b.*$'
Ok try my edit! I missed that this was a MySQL regex.
So while that works in php, it would only work if I looped all the row data of a database table and checked against it with pregmatch. Its not a viable option for large datasets. Is your php regex easily translatable to a MySQL REGEXP query. Thanks for the help.
Will thank you so much. Its clean, does exactly what I need it to do. It operates the way i tried designing my regexp in question. I've played with it a bit, and its easy to adjust for additional conditions, for instance, recognizing additional numbers within a csv sequence.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.