REGEXP_REPLACE replace spaces between two symbols

Question

I need to replace all spaces with one % between two specific symbols (@ and &); like followings:

'this @ is test   &that did not @turn& out well'

should be converted to

'this @%is%test%&that did not @turn& out well'

and

'@pattern matching&  is my number one enemy'

to

'@pattern%matching&  is my number one enemy'

I almost read all related questions in stackoverflow and other sites but couldn't get a helpful answer.

Please share the code that fails for you. The most recent attempt to solve the problem would do. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 8, 2019 at 20:00
Ok, I see. The issue here is that you cannot use two patterns as start and end delimiters to search for multiple matches in-between with PostgreSQL regex. It is possible in other regex flavors either thanks to infinite width lookbehind or \G operator. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 8, 2019 at 20:15

marc_s · Accepted Answer · 2019-11-22 16:39:41Z

One (inefficient) way of doing this is by doing multiple REGEXP_REPLACE calls.

For example, lets look at the following plpgsql function.

CREATE OR REPLACE FUNCTION replaceSpacesBetweenTwoSymbols(startChar TEXT, endChar TEXT, textToParse TEXT)
    RETURNS TEXT
AS $$
DECLARE resultText TEXT := textToParse;
DECLARE tempText TEXT := textToParse;
BEGIN
    WHILE TRUE LOOP
        tempText = REGEXP_REPLACE(resultText,
                                  '(' || startChar || '[^' || endChar || ']*)' || '( )(.*' || endChar || ')',
                                  '\1%\3');
        IF tempText = resultText
        THEN RETURN resultText;
        END IF;
        resultText := tempText;
    END LOOP;
    RETURN resultText;

END;
$$
LANGUAGE 'plpgsql';

We create a function that takes three arguments, the startChar, the endChar and the textToParse which holds the text that will be trimmed.

We start by creating a a regular expression based on the startChar and endChar. If the value of startChar is @ and the value of endChar is & we will get the following regular expression:

(@[^&]*)( )(.*&)

This regular expression is consisted of three groups:

(@[^&]*) - This group matches the text that is between the @ and an an empty space character - ' ';
( ) - This group matches a single space character.
(.*&) - This group matches the text that is between a space character and the & character.

In order to replace the space (group 2), we use the following REGEXP_REPLACE call:

REGEXP_REPLACE(resultText,' (@[^&]*)( )(.*&)', '\1%\3')

From that expression you can see that we are replacing the second group (which is a space) with the % character.

This way, we will only replace one space per one REGEXP_REPLACE execution. Once we find that there are no more spaces that need to be replaced, we return the modified TEXT.

At this moment, the spaces are replaced with % characters. One last thing we need to do is to replace the multiple consecutive % characters with a single %.

That can be done with another REGEXP_REPLACE call at the end. So for example:

SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('@','&','this @ is test   &that did not @turn& out well'),'%{2,}','%');

Will return

this @%is%test%&that did not @turn& out well

as a result, while this

SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('@','&','this is @a more  complex& task @test a a & w'),'%{2,}','%');

will return

this is @a%more%complex& task @test%a%a%& w

as a result.

Best answer ever! Thank you. I used the code with a little changes

Collectives™ on Stack Overflow

REGEXP_REPLACE replace spaces between two symbols

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related