Get domain from URL in Oracle SQL

Question

I have a database that contains website URL's. From those URL's I'd like to extract the domain name. Here are two (quiet different) examples:

http://www.example.com       -> example.com
example.co.uk/dir/index.html -> example.co.uk

In order to do this I am using a regular expression and the functions REGEXP_SUBSTR and REGEXP_REPLACE that Oracle provides. I am using replace to replace the preceding http[s] and the www. with an empty string (deleting it). Then I use substring to get the string between the beginning and the first / or if there is no / the whole string. My code looks like this:

REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '(.+?)(/|$)')

Everything works as expected, except the fact that my regex fails to exclude the /:

example.com/dir/index.html -> example.com/

I would like to get rid of the /. How do I do that?

San · Accepted Answer · 2014-01-11 16:07:48Z

7

Use this :

WITH tab AS 
 (SELECT 'https://www.example.co.uk/dir/index.html' AS website_url 
    FROM dual)
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '\w+(\.\w+)+') 
  FROM tab;

output:

|REGEXP_SUBSTR(REGEXP_REPLACE(W|
--------------------------------
|example.co.uk                 |

answered Jan 11, 2014 at 16:07

San

4,5381 gold badge15 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Foaly Over a year ago

This works very nice! Thank you very much. But sadly it doesn't work for URL's that include a - for example the URl www.top.i-am-a-example.com gives top.i I tried but I can't fix it. Do you know how?

San Over a year ago

Adding permissible range could be one solution to this. REGEXP_SUBSTR(REGEXP_REPLACE(website_url, '^http[s]?://(www\.)?|^www\.', '', 1), '[a-z,A-Z,0-9,-]+(\.\w+)+')

Foaly Over a year ago

Yes adding a range seems to be the only option. Using the your code I still get top.i. I am not an expert on regex, so I don't know why... Looks correct to me

Foaly · Accepted Answer · 2014-01-11 22:09:15Z

5

Thanks to the hints in the answers I finally got it working!

The code I am using now looks like this:

REGEXP_REPLACE(website_url, '(http[s]?://)?(www\.)?(.*?)((/|:)(.)*|$)', '\3')

Thanks for the help everybody!

answered Jan 11, 2014 at 22:09

Foaly

6171 gold badge9 silver badges22 bronze badges

Comments

Sabuj Hassan · Accepted Answer · 2014-01-11 16:07:58Z

1

Not sure whether oracle supports the ?: to exclude a group or not.

REGEXP_REPLACE(website_url, '^(?:(?:http[s]?://)?www\.)?(.*?)(?:/.*|$)', '\1')

If it doesn't, then this one:

REGEXP_REPLACE(website_url, '^((http[s]?://)?www\.)?(.*?)(/.*|$)', '\3')

answered Jan 11, 2014 at 16:07

Sabuj Hassan

39.7k14 gold badges83 silver badges88 bronze badges

1 Comment

Foaly Over a year ago

As far as I can see it Oracle does not support ?: the second works as expected, but somehow it does not work for urls like: www.example.com/dir/index.html it returns: example.comdir/index.html

Community · Accepted Answer · 2017-05-23 10:31:33Z

You could use the following regex matching something_without_a_dot.something_without_a_dot from the end of the string. You'll get the answer in the first group. If you need the TLD also, you can enclose everything in () except the $.

([^.]+)\.[^.]+$

In SQL, that gives:

SQL> select regexp_replace('sub1.sub2.domain.com', '^.*?([^.]+)\.[^.]+$', '\1') from dual;

REGEXP
------
domain

The non-greedy .*? at the start allows you to ignore the start of the string.

To get the domain name plus the TLD:

SQL> select regexp_replace('sub1.sub2.domain.com', '^.*?([^.]+\.[^.]+)$', '\1') from dual;

REGEXP_REP
----------
domain.com

To take into account co.uk:

SQL> select regexp_replace('sub1.sub2.domain.co.uk', '^.*?([^.]+\.(co\.uk|[^.]+))$', '\1') from dual;

REGEXP_REPLA
------------
domain.co.uk

Source

Tala · Accepted Answer · 2016-05-24 01:02:43Z

0

Why not using (http)uritype and extract host from that?

answered May 24, 2016 at 1:02

Tala

93711 silver badges29 bronze badges

Collectives™ on Stack Overflow

Get domain from URL in Oracle SQL

5 Answers 5

3 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related