0

I am stuck... I have a field (Description) in MySQL with a very long string. Embedded in that string is a reference number that I need to extract to another field using a view. The string will look something like this.

LOREM IPSUM DOLOR SIT AMET CONSECTETUR ADIPISCING ELIT INTEGER NEC ODIO XX00000000X LIBERO SED CURSUS ANTE DAPIBUS DIAM SED NISI NULLA QUIS SEM AT NIBH ELEMENTUM IMPERDIET

What I need from that string is XX00000000X. It always starts with two letters, numbers in the middle and ends with a letter.

I have the following query:

SELECT 
    Description, 
    SUBSTRING_INDEX (Description, ' ',  (Description REGEXP '[[:upper:]]{1,2}[[:digit:]]+[[:upper:]]$') * -1 ) AS Reference 
FROM db_test.tbl_regex;

The problem is that it only collects the Reference data when it is at the end of the Description field.

7
  • 1
    MySQL has no built in regex replacement/extraction capabilities. Can you be more specific about the string you are trying to match? For example, is it always preceded or followed by one or more strings which might serve as markers? Commented May 10, 2018 at 15:42
  • 1
    I don't understand why you're using it in SUBSTRING_INDEX. The REGEX operator just returns a true/false result saying whether the string matches, not any kind of index. Commented May 10, 2018 at 15:44
  • 1
    You can find which rows have such data, but if you're using mysql you'll need to pluck out the reference number in your app layer. Commented May 10, 2018 at 15:45
  • 2
    Agree with @Bohemian, but the reason it's just finding those at the end is the $ in your regex. That matches end of line. Commented May 10, 2018 at 15:47
  • In MySQL, you can find rows that match the pattern, but you'll have to extract the matching value in the application layer. Commented May 10, 2018 at 15:48

1 Answer 1

5

REGEXP_SUBSTR() was introduced in MariaDB 10.0.5 and MySQL 8.0.

That is what you need to locate and extract the XX00000000X.

REGEXP (as you used it) only returns true/false. SUBSTRING_INDEX() needs the string.

If you can't upgrade to one of those, the best you can do is use REGEXP to identify rows that have XX00000000X, then use your app code to extract it.

Sign up to request clarification or add additional context in comments.

2 Comments

This is a big help. From what I see in the documentation, v8.0 was just released in April. The version history jumps from 5.7 to 8.0 so I will be looking to my host to update in the near future.
Yes, 8.0.11 is hot off the presses. And there is already a bug in a related routine - REGEXP_REPLACE; see stackoverflow.com/questions/50247765/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.