Explain unexpected regexp_replace result

Question

I found an unexpected result when using regexp_replace to concatenate a string on the end of another string, as an exercise in using regexp_replace to do it. I bring it up to not only figure out why, but to let folks know of this possibly unexpected result.

Consider this statement where the intent is to tack "note 2" on the end of string "Note 1". My intention was to group the entire line, then concatenate the new string to the end:

select regexp_replace('note 1', '(.*)', '\1' || ' note 2') try_1 from dual;

But take a look at the result:

TRY_1               
--------------------
note 1 note 2 note 2

The note gets repeated twice! Why?

If I change the pattern to include the start of line and end of line anchors, it works as expected:

select regexp_replace('note 1', '^(.*)$', '\1' || ' note 2') try_2 from  dual;

TRY_2        
-------------
note 1 note 2

Why should that make a difference?

EDIT: please see Politank-Z's explanation below. I wanted to add if I change the first example to use a plus (match 1 or more occurrences of the previous character) as opposed to the asterisk (for 0 or more occurrences of the previous character) it works as expected:

select regexp_replace('note 1', '(.+)', '\1' || ' note 2') try_3 from dual;

TRY_3        
-------------
note 1 note 2

Community · Accepted Answer · 2020-06-20 09:12:55Z

3

As per the Oracle Documentation:

By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string.

The key there is every occurence. .* matches the empty string, and the Oracle regexp engine is first matching the entire string, then the following empty string. By adding the anchors, you eliminate this. Alternatively, you could specify the occurrence parameter per the linked documentation.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Apr 6, 2015 at 14:37

Politank-Z

3,7433 gold badges27 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Gary_W Over a year ago

Can you explain where the "empty string" comes from? Thanks

Politank-Z Over a year ago

The asterisk indicates that the preceding regexp atom occurs zero or more times. Given your overall regexp, that means that zero matches - an empty string - is a valid match. regexp_replace applies your regexp once, matching the entire string (see greedy vs non-greedy in regexp terms), then looks for another match starting at the end of the previous match. The end of the previous match is after the last character, leaving an empty string.

Gary_W Over a year ago

Very interesting! I replaced the pattern '.*' with '.+' (the plus being for matching 1 or more as opposed to the asterisk meaning 0 or more) in the first example and it works as expected! Thanks Politank-Z!

Collectives™ on Stack Overflow

Explain unexpected regexp_replace result

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related