1

I need to replace all spaces with one % between two specific symbols (@ and &); like followings:

'this @ is test   &that did not @turn& out well' 

should be converted to

'this @%is%test%&that did not @turn& out well'

and

'@pattern matching&  is my number one enemy'

to

'@pattern%matching&  is my number one enemy'

I almost read all related questions in stackoverflow and other sites but couldn't get a helpful answer.

3
  • Please share the code that fails for you. The most recent attempt to solve the problem would do. Commented Jan 8, 2019 at 20:00
  • SELECT REGEXP_REPLACE('@ M j &test','@( )(.*)()@','@% %&') Commented Jan 8, 2019 at 20:03
  • Ok, I see. The issue here is that you cannot use two patterns as start and end delimiters to search for multiple matches in-between with PostgreSQL regex. It is possible in other regex flavors either thanks to infinite width lookbehind or \G operator. Commented Jan 8, 2019 at 20:15

1 Answer 1

2

One (inefficient) way of doing this is by doing multiple REGEXP_REPLACE calls.

For example, lets look at the following plpgsql function.

CREATE OR REPLACE FUNCTION replaceSpacesBetweenTwoSymbols(startChar TEXT, endChar TEXT, textToParse TEXT)
    RETURNS TEXT
AS $$
DECLARE resultText TEXT := textToParse;
DECLARE tempText TEXT := textToParse;
BEGIN
    WHILE TRUE LOOP
        tempText = REGEXP_REPLACE(resultText,
                                  '(' || startChar || '[^' || endChar || ']*)' || '( )(.*' || endChar || ')',
                                  '\1%\3');
        IF tempText = resultText
        THEN RETURN resultText;
        END IF;
        resultText := tempText;
    END LOOP;
    RETURN resultText;

END;
$$
LANGUAGE 'plpgsql';

We create a function that takes three arguments, the startChar, the endChar and the textToParse which holds the text that will be trimmed.

We start by creating a a regular expression based on the startChar and endChar. If the value of startChar is @ and the value of endChar is & we will get the following regular expression:

(@[^&]*)( )(.*&)

This regular expression is consisted of three groups:

  1. (@[^&]*) - This group matches the text that is between the @ and an an empty space character - ' ';

  2. ( ) - This group matches a single space character.

  3. (.*&) - This group matches the text that is between a space character and the & character.

In order to replace the space (group 2), we use the following REGEXP_REPLACE call:

REGEXP_REPLACE(resultText,' (@[^&]*)( )(.*&)', '\1%\3')

From that expression you can see that we are replacing the second group (which is a space) with the % character.

This way, we will only replace one space per one REGEXP_REPLACE execution. Once we find that there are no more spaces that need to be replaced, we return the modified TEXT.

At this moment, the spaces are replaced with % characters. One last thing we need to do is to replace the multiple consecutive % characters with a single %.

That can be done with another REGEXP_REPLACE call at the end. So for example:

SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('@','&','this @ is test   &that did not @turn& out well'),'%{2,}','%');

Will return

this @%is%test%&that did not @turn& out well

as a result, while this

SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('@','&','this is @a more  complex& task @test a a & w'),'%{2,}','%');

will return

this is @a%more%complex& task @test%a%a%& w

as a result.

Sign up to request clarification or add additional context in comments.

1 Comment

Best answer ever! Thank you. I used the code with a little changes

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.