2

I found an unexpected result when using regexp_replace to concatenate a string on the end of another string, as an exercise in using regexp_replace to do it. I bring it up to not only figure out why, but to let folks know of this possibly unexpected result.

Consider this statement where the intent is to tack "note 2" on the end of string "Note 1". My intention was to group the entire line, then concatenate the new string to the end:

select regexp_replace('note 1', '(.*)', '\1' || ' note 2') try_1 from dual;

But take a look at the result:

TRY_1               
--------------------
note 1 note 2 note 2

The note gets repeated twice! Why?

If I change the pattern to include the start of line and end of line anchors, it works as expected:

select regexp_replace('note 1', '^(.*)$', '\1' || ' note 2') try_2 from  dual;

TRY_2        
-------------
note 1 note 2

Why should that make a difference?

EDIT: please see Politank-Z's explanation below. I wanted to add if I change the first example to use a plus (match 1 or more occurrences of the previous character) as opposed to the asterisk (for 0 or more occurrences of the previous character) it works as expected:

select regexp_replace('note 1', '(.+)', '\1' || ' note 2') try_3 from dual;

TRY_3        
-------------
note 1 note 2

1 Answer 1

3

As per the Oracle Documentation:

By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string.

The key there is every occurence. .* matches the empty string, and the Oracle regexp engine is first matching the entire string, then the following empty string. By adding the anchors, you eliminate this. Alternatively, you could specify the occurrence parameter per the linked documentation.

Sign up to request clarification or add additional context in comments.

3 Comments

Can you explain where the "empty string" comes from? Thanks
The asterisk indicates that the preceding regexp atom occurs zero or more times. Given your overall regexp, that means that zero matches - an empty string - is a valid match. regexp_replace applies your regexp once, matching the entire string (see greedy vs non-greedy in regexp terms), then looks for another match starting at the end of the previous match. The end of the previous match is after the last character, leaving an empty string.
Very interesting! I replaced the pattern '.*' with '.+' (the plus being for matching 1 or more as opposed to the asterisk meaning 0 or more) in the first example and it works as expected! Thanks Politank-Z!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.