4

I have a large database that contains many urls, there are many domains repeating and i;m trying to get only the domain. eg:

http://example.com/someurl.html
http://example.com/someurl_on_the_same_domain.html
http://example.net/myurl.php
http://example.org/anotherurl.php

and i want to get only domains, eg:

http://example.com
http://example.net
http://example.org

My query is:

SELECT id, site FROM table GROUP BY site ORDER BY id DESC LIMIT 50

I need to use regex i think but i'm not mysql guru.

5 Answers 5

11
SELECT
    SUBSTR(site, 1 , LOCATE('/', site, 8)-1)
        as OnlyDomain
    FROM table
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50

[EDIT] : After OP request, here's the updated answer that will show correct results even if domain names does not have trailing slashes:

SELECT
    SUBSTR(site, 1 , IF(LOCATE('/', site, 8), LOCATE('/', site, 8)-1, LENGTH(site)))
        as OnlyDomain
    FROM tablename
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50
Sign up to request clarification or add additional context in comments.

2 Comments

thx, it seems to be working but when a url only contains domain name (google.com and not google.com/url.html) it doesn't pulling it
The edited code works a treat, still reasonably fast as well and saves storing the value separately.
6
SELECT 
COUNT(*) AS nCount,
SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(site,'http://',''),'https://',''),'www.',''),'/',1) AS sDomain 
FROM tbl_table
GROUP BY sDomain 
ORDER BY 1 DESC

Addon after JQman sollution with also the www. replaced and the group by + count

Comments

1

You can use string replacement. Assuming that the "site" column contains the url:

select id, 
substr(site,1,locate('/', site ,locate('//',site)+2)-1)
from table 
group by site
order by id
desc limit 50;

Be careful to make sure that multiple slashes are encoded, eg:

http://example.com/somethingelse/someurl.html

Comments

1
SELECT id,
       SUBSTRING_INDEX(REPLACE(REPLACE(site,'http://',''),'https://',''),'/',1) as domain 
       FROM table
       GROUP BY domain 
       ORDER BY id DESC 
       LIMIT 50

That was working for me, if anybody needs it.

1 Comment

THis one actually works best. I found it here too: sitepoint.com/forums/…
0

You can select the domains with:

select left(site,6+locate('/',substring(site,8)))

1 Comment

If you need it to work on 'example.com', you should put an example like that in the question. Looks like you've already solved the problem though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.