0
select regexp_replace('https://www.facebook.com/cricket/hello', '.*\..*?\/', '')

The above code is giving me

hello

instead of

cricket/hello

I checked on Regexp checking website and the pattern is correct. I am not sure where am I going wrong.

DBMS: "PostgreSQL 8.2.15 (Greenplum Database 4.2.8.3 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 2 2014 01:33:14"

1
  • What are you trying to do rather than get cricket/hello ? Is this a school or training assignment? Commented Apr 27, 2015 at 2:17

3 Answers 3

1

I don't know how, but this worked

.*?\.[a-z]+\/

Taking Andrew Wolfe's query on weirdest kinds of URLs.

select testval, regexp_replace ( testval,  '.*?\.[a-z]+\/',  '')
from (
    select 'https://www.facebook.com/cricket/hello' as testval
  union all
  select 'http://a.b.co.uk/cric.ke.t/hello' as testval
  union all
  select 'ftp://a.b.com.d.e.f/relroot/cricket/hello' as testval  union all
  select 'http://www.google.co.uk/cricket/hello' as testval  
  union all
  select 'http://a.b.co.uk/cricket/hello/this/is/a/little/longer?and&it=has&args' as testval
) vals

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

Try this:

select regexp_replace('https://www.facebook.com/cricket/hello', '.*\.[a-z]+\/', '')

Also work with cctld's :

select regexp_replace('https://www.google.co.uk/cricket/hello', '.*\.[a-z]+\/', '')

4 Comments

@Ashish - Pedro's answer works fine for your example - are you just trying to make this work for that one example? Otherwise, I would recommend you think carefully about the requirements and update your question (or accept this answer and post a new question). btw, it looks like the issue with your code is that regexp_replace is matching greedy (up the the last /).
I have other URLs as well like www.google.co.uk/cricket/hello and the answer would not work on this example. @steveklein , you are right about the greediness issue. Although I have used the non-greedy version, the regexp_replace is still going greedy. How can I solve this.
I've updated my answer, give it a try. It works with any gtld or cctld.
@PedroLobito Did not work. It still gave hello as the result. I believe there is some issue with GreenPlum.
0

I'm assuming that you want the path part of a URL.

I don't have my pg up but I'd go very explicit with each part of the URL -

'[^:]+:\/\/[A-Za-z][-a-zA-Z0-9]*(\.[A-Za-z][-a-zA-Z0-9]*)*/'

A test:

select testval, regexp_replace ( testval,  '[^:]+:\/\/[A-Za-z][-a-zA-Z0-9]*(\.[A-Za-z][-a-zA-Z0-9]*)*/',  '')
from (
    select 'https://www.facebook.com/cricket/hello' as testval
  union all
  select 'http://a.b.co.uk/cric.ke.t/hello' as testval
  union all
  select 'ftp://a.b.com.d.e.f/relroot/cricket/hello' as testval  union all
  select 'http://www.google.co.uk/cricket/hello' as testval  
  union all
  select 'http://a.b.co.uk/cricket/hello/this/is/a/little/longer?and&it=has&args' as testval
) vals

See http://sqlfiddle.com/#!15/9eecb/857/0

1 Comment

This looks great but still did not work on GreenPlum. The first four gave hello as results and the last one gave longer?and&it=has&args

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.