1

I'm trying to use this script (which extracts the domain from a URL):

CREATE FUNCTION [dbo].[parseURL]  (@strURL varchar(1000))
RETURNS varchar(1000)
AS
BEGIN
IF CHARINDEX('http://',@strURL) > 0 OR CHARINDEX('https://',@strURL) > 0
SELECT @strURL = REPLACE(@strURL,'https://','')
SELECT @strURL = REPLACE(@strURL,'http://','')
SELECT @strURL = REPLACE(@strURL,'www','')
-- Remove everything after "/" if one exists
IF CHARINDEX('/',@strURL) > 0 (SELECT @strURL = LEFT(@strURL,CHARINDEX('/',@strURL)-1))    
RETURN @strURL
END

Using this SQL command:

SELECT COUNT(*) as theCount, dbo.parseURL(url) as url FROM hit
WHERE url IS NOT NULL
GROUP BY dbo.parsedomain(url)
ORDER BY thecount DESC

But I'm receiving this error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[dbo].[parseURL] (@strURL varchar(1000)) RETURNS varchar(1000) AS BEGIN IF ' at line 1

I am relatively new to both SQL and MySQL, so not sure if the error is SQL-general or MySQL-specific. What am I missing?

3
  • The function is for SQL Server, using TSQL - it will never work as-is on MySQL. Commented Jul 31, 2010 at 0:16
  • Interesting. What would you recommend using MySQL, then? Commented Jul 31, 2010 at 0:18
  • Yup. Working on a MySQL version... Now. Commented Jul 31, 2010 at 0:18

2 Answers 2

4

Jesus Christ Ponies, effort wasted, but here it is:

DELIMITER |
CREATE FUNCTION DOMAIN (url VARCHAR(255)) 
 RETURNS VARCHAR(255) DETERMINISTIC
  BEGIN
   DECLARE pos INT;
    SET pos := LOCATE('://', url);
    SET url := CASE WHEN pos=0 THEN url ELSE SUBSTR(url, pos+3) END;
    SET url := REPLACE(url,'www.',''); 
    RETURN SUBSTRING_INDEX(url, '/', 1);
   END

Then query:

  SELECT COUNT(*) as theCount,    
         DOMAIN(url) as url    
    FROM hit   
   WHERE url IS NOT NULL   
GROUP BY url   
ORDER BY thecount DESC  

My function has the advantage of stripping any protocol, https, http, ftp, whatever

Sign up to request clarification or add additional context in comments.

2 Comments

+1: I think yours is more efficient - I was just doing a quick & dirty change.
@OMG Probably. I pulled a bunch of random URLs from some urllist.txt files (about 25000 of them) and ran the functions. Mine got an average of around 0.52s, yours around 0.73s.
3

Here's the equivalent MySQL function:

DELIMITER $$

CREATE FUNCTION parseURL(strURL VARCHAR(200)) 
  RETURNS varchar(200) 
BEGIN

  IF INSTR(strURL, 'http://') > 0 OR INSTR(strURL, 'https://') > 0 THEN
    SET strURL = REPLACE(strURL,'https://','');
    SET strURL = REPLACE(strURL,'http://','');
    SET strURL = REPLACE(strURL,'www','');
    -- Remove everything after "/" if one exists
    IF INSTR(strURL, '/') > 0 THEN 
     SET strURL = LEFT(strURL, INSTR(strURL, '/')-1);
   END IF;
  END IF;

  RETURN strURL;
END

Use this query:

  SELECT COUNT(*) as theCount, 
         parseURL(url) as url 
    FROM hit
   WHERE url IS NOT NULL
GROUP BY parseURL(url)
ORDER BY thecount DESC

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.