1

I'm trying to remove everything between two captions in postgres:

regexp_replace(text, 'caption1:[\S\s\n\r]+?:', '', 'ig') AS text

But I get this error:

ERROR: invalid regular expression: invalid escape \ sequence
SQL state: 2201B

It looks like it doesn't allow me to match with \S (any non-whitespace character)

Example text:

Lorem ipsum

Caption1:
I want this text to be removed.
And this line too.


Caption2:
Consectetuer adipiscing elit.

It should become:

Lorem ipsum

Consectetuer adipiscing elit.

4 Answers 4

1

From the document:

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal. (So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to [a-c^[:digit:]], is illegal.)

So your regex should be:

caption1:[^[:space:][:space:]\n\r]+?:
Sign up to request clarification or add additional context in comments.

1 Comment

I tried, but it doesn't remove anything. But thanks for the inspiration to use the not ^ operator, that does the trick for me too!
1

This eventually worked for me:

regexp_replace(text, 'caption1:[^:]+?:', '', 'ig') AS text

1 Comment

See my answer for a more optimized solution and explanation why your original solution did not work.
0

You need two backslashes if you want to use an escaped character class, e.g. use \\s instead of \s. But in any case, I don't think your logic really requires this. Instead, you might be able to use the following query:

SELECT 'Caption1: ' || right(text, char_length(text) - position('Caption2' in text) + 1)
FROM yourTable

1 Comment

You only need two backlashes if you changed the default configuration to not use standard conforming strings. Otherwise the backslash is not a special character that needs escaping in SQL.
0

The [\S\s\n\r] cannot work in PostgreSQL because this engine does not support shorthand Perl-like character classes (like \S, \d, \W, etc.) inside bracket expressions (i.e. inside [...]). They are parsed as \ and the letter after them.

You need to use

regexp_replace(text, 'caption1:[^:]+:', '', 'ig') AS text

Note that + is a regular greedy quantifier that matches one or more occurrences of the pattern it modifies. The quantified pattern is [^:]. It is a character class (or also called a bracket expression) that is negated using the ^ char that goes right after ^. So, [^:] matches any char other than a : including line break chars.

You do not need ? after + as the lazy pattern here, in this case, will work slower than the greedy version.

So, use caption1:[^:]+::

  • caption1: - a literal substring
  • [^:]+ - 1 or more chars other than :
  • : - a literal :

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.