Issue of Regular Expression for URL pattern on POSTGRES

Question

select regexp_replace('https://www.facebook.com/cricket/hello', '.*\..*?\/', '')

The above code is giving me

hello

instead of

cricket/hello

I checked on Regexp checking website and the pattern is correct. I am not sure where am I going wrong.

DBMS: "PostgreSQL 8.2.15 (Greenplum Database 4.2.8.3 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 2 2014 01:33:14"

What are you trying to do rather than get cricket/hello ? Is this a school or training assignment? — Andrew Wolfe
– Andrew Wolfe, Commented Apr 27, 2015 at 2:17

Ashish · Accepted Answer · 2015-04-27 13:57:53Z

1

I don't know how, but this worked

.*?\.[a-z]+\/

Taking Andrew Wolfe's query on weirdest kinds of URLs.

select testval, regexp_replace ( testval,  '.*?\.[a-z]+\/',  '')
from (
    select 'https://www.facebook.com/cricket/hello' as testval
  union all
  select 'http://a.b.co.uk/cric.ke.t/hello' as testval
  union all
  select 'ftp://a.b.com.d.e.f/relroot/cricket/hello' as testval  union all
  select 'http://www.google.co.uk/cricket/hello' as testval  
  union all
  select 'http://a.b.co.uk/cricket/hello/this/is/a/little/longer?and&it=has&args' as testval
) vals

enter image description here

answered Apr 27, 2015 at 13:57

Ashish

1412 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pedro Lobito · Accepted Answer · 2015-04-27 01:46:11Z

0

Try this:

select regexp_replace('https://www.facebook.com/cricket/hello', '.*\.[a-z]+\/', '')

Also work with cctld's :

select regexp_replace('https://www.google.co.uk/cricket/hello', '.*\.[a-z]+\/', '')

edited Apr 27, 2015 at 1:46

answered Apr 24, 2015 at 23:15

Pedro Lobito

99.8k36 gold badges274 silver badges278 bronze badges

4 Comments

steve klein Over a year ago

@Ashish - Pedro's answer works fine for your example - are you just trying to make this work for that one example? Otherwise, I would recommend you think carefully about the requirements and update your question (or accept this answer and post a new question). btw, it looks like the issue with your code is that regexp_replace is matching greedy (up the the last /).

Ashish Over a year ago

I have other URLs as well like www.google.co.uk/cricket/hello and the answer would not work on this example. @steveklein , you are right about the greediness issue. Although I have used the non-greedy version, the regexp_replace is still going greedy. How can I solve this.

Pedro Lobito Over a year ago

I've updated my answer, give it a try. It works with any gtld or cctld.

Ashish Over a year ago

@PedroLobito Did not work. It still gave hello as the result. I believe there is some issue with GreenPlum.

Andrew Wolfe · Accepted Answer · 2015-04-27 02:36:19Z

0

I'm assuming that you want the path part of a URL.

I don't have my pg up but I'd go very explicit with each part of the URL -

'[^:]+:\/\/[A-Za-z][-a-zA-Z0-9]*(\.[A-Za-z][-a-zA-Z0-9]*)*/'

A test:

select testval, regexp_replace ( testval,  '[^:]+:\/\/[A-Za-z][-a-zA-Z0-9]*(\.[A-Za-z][-a-zA-Z0-9]*)*/',  '')
from (
    select 'https://www.facebook.com/cricket/hello' as testval
  union all
  select 'http://a.b.co.uk/cric.ke.t/hello' as testval
  union all
  select 'ftp://a.b.com.d.e.f/relroot/cricket/hello' as testval  union all
  select 'http://www.google.co.uk/cricket/hello' as testval  
  union all
  select 'http://a.b.co.uk/cricket/hello/this/is/a/little/longer?and&it=has&args' as testval
) vals

See http://sqlfiddle.com/#!15/9eecb/857/0

answered Apr 27, 2015 at 2:36

Andrew Wolfe

2,11620 silver badges28 bronze badges

1 Comment

Ashish Over a year ago

This looks great but still did not work on GreenPlum. The first four gave hello as results and the last one gave longer?and&it=has&args

Collectives™ on Stack Overflow

Issue of Regular Expression for URL pattern on POSTGRES

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related