1

I'm struggling to get the groups that match the pattern, out of a string in Oracle 11g. It's nearly working, but I don't understand why the final non-matching part is still there:

select regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*','\1,') c1
from dual

Current result: ST1_12,KG32_1,VI7_08, text3

Expected result: ST1_12,KG32_1,VI7_08

It seems to me, that the end part is not included in the search pattern and that is simply glued at the end, but how get rid of that?

1 Answer 1

2

After it matches the third group it starts looking for the next match from text3; the trailing .* is effectively ignored. For the earlier groups that's what you want - otherwise the trailing .* on the first group would include the rest of the string and you'd lose the other groups. When it starts from text3 it doesn't find another match, so the original value (at that point) is returned.

If the values are always comma-separated then you could include comma or end-of-string anchor to make it include the remaining text - up to the space anyway - in the match, but not in \1:

select regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*(,|$)', '\1,') c1
from dual;

ST1_12,KG32_1,VI7_08,

You can use a trim function to get rid of the trailing comma:

select rtrim(regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*(,|$)','\1,', 1, 0, null),
  ',') as c1
from dual;

ST1_12,KG32_1,VI7_08

Another option, which doesn't rely on the commas existing, is to split the string into multiple values:

select regexp_substr('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)', 1, level, null, 1) as c1
from dual
connect by level <= regexp_count('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)');

ST1_12
KG32_1
VI7_08

and then aggregate them back together:

select listagg(
  regexp_substr('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)', 1, level, null, 1),
  ',') within group (order by level) as c1
from dual
connect by level <= regexp_count('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)');

ST1_12,KG32_1,VI7_08

db<>fiddle

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you vey much Alex for your thorough explanation, well appreciated!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.