Oracle regexp_replace to pick out pattern matching groups

Question

I'm struggling to get the groups that match the pattern, out of a string in Oracle 11g. It's nearly working, but I don't understand why the final non-matching part is still there:

select regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*','\1,') c1
from dual

Current result: ST1_12,KG32_1,VI7_08, text3

Expected result: ST1_12,KG32_1,VI7_08

It seems to me, that the end part is not included in the search pattern and that is simply glued at the end, but how get rid of that?

Alex Poole · Accepted Answer · 2021-06-25 17:23:34Z

After it matches the third group it starts looking for the next match from text3; the trailing .* is effectively ignored. For the earlier groups that's what you want - otherwise the trailing .* on the first group would include the rest of the string and you'd lose the other groups. When it starts from text3 it doesn't find another match, so the original value (at that point) is returned.

If the values are always comma-separated then you could include comma or end-of-string anchor to make it include the remaining text - up to the space anyway - in the match, but not in \1:

select regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*(,|$)', '\1,') c1
from dual;

ST1_12,KG32_1,VI7_08,

You can use a trim function to get rid of the trailing comma:

select rtrim(regexp_replace('ST1_12 text1, KG32_1 text2, VI7_08 text3','.*?(\w+\d+_\d+).*(,|$)','\1,', 1, 0, null),
  ',') as c1
from dual;

ST1_12,KG32_1,VI7_08

Another option, which doesn't rely on the commas existing, is to split the string into multiple values:

select regexp_substr('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)', 1, level, null, 1) as c1
from dual
connect by level <= regexp_count('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)');

ST1_12
KG32_1
VI7_08

and then aggregate them back together:

select listagg(
  regexp_substr('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)', 1, level, null, 1),
  ',') within group (order by level) as c1
from dual
connect by level <= regexp_count('ST1_12 text1, KG32_1 text2, VI7_08 text3', '(\w+\d+_\d+)');

ST1_12,KG32_1,VI7_08

db<>fiddle

Thank you vey much Alex for your thorough explanation, well appreciated!

Collectives™ on Stack Overflow

Oracle regexp_replace to pick out pattern matching groups

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related