2

Please help me to extract the text after the last occurrence of the Schedule : and the end of the line.

Lane Closures : Lane one will be closed
Reason : Roadworks are planned
Status : Pending
Schedule : Expect disruption everyday between 20:00 and 06:00 from 5 October 2020 to 9 October 2020
Schedule : Expect disruption everyday between 20:00 and 06:00 from 12 October 2020 to 16 October 2020
Schedule : Expect disruption everyday between 20:00 and 06:00 from 19 October 2020 to 23 October 2020
Schedule : Expect disruption everyday between 20:00 and 06:00 from 26 October 2020 to 31 October 2020
Lanes Closed : There will be one of two lanes closed

In the above case, I need to extract Expect disruption everyday between 20:00 and 06:00 from 26 October 2020 to 31 October 2020

So far I came up only with the following:

(?<=Schedule : ).*(?![\s\S]*Schedule)

But it doesn't work in Postgres. It returns error: invalid regular expression: invalid escape \ sequence

I also tried to replace \s and \S with [:space:] and ^[:space:] as per Postgres documentation but it doesn't work either

Thanks in advance.

1
  • In PostgreSQL, you do not need [\s\S], just use a dot, .. Thus, you need (?<=Schedule : )[^\r\n]+(?!.*Schedule) Commented Dec 3, 2020 at 8:41

1 Answer 1

3

Since a . in a PostgreSQL regex matches any char including line break chars, you need to introduce two changes:

  • The first .* should be replaced with [^\r\n]+ to match any chars other than common line break chars
  • The [\s\S] in the lookahead should be replaced with a mere ..

You can use

(?<=Schedule : )[^\r\n]+(?!.*Schedule)

See the online demo:

SELECT REGEXP_MATCHES(
    E'Lane Closures : Lane one will be closed\nReason : Roadworks are planned\nStatus : Pending\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 5 October 2020 to 9 October 2020\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 12 October 2020 to 16 October 2020\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 19 October 2020 to 23 October 2020\nSchedule : Expect disruption everyday between 20:00 and 06:00 from 26 October 2020 to 31 October 2020\nLanes Closed : There will be one of two lanes closed', 
    '(?<=Schedule : )[^\r\n]+(?!.*Schedule)', 
    'g') 

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

@DmytroSly Just FYI, you do not need the lookbehind, just use a capturing group around the pattern part you need to extract, 'Schedule\s*:\s*([^\r\n]+)(?!.*Schedule)' would work the same here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.