1

Is it possible to achieve this task with a single MySQL query?

Table urls. Fields {id, url}

1, www.mysite.kom
2, mysite.kom
3, anothersite.kom

Table logs. Fields {id, url_id, group_type - a number in the range of 1..10}

1, 1, 4
2, 1, 4
3, 2, 5
4, 2, 5
5, 3, 9

The result of the query in this example should be: 1 (mysite.com and www.mysite.com = 1)

THE GOAL:

Need to count all distinct urls recorded in logs table, but with a few conditions:

1) Urls with and without www. prefix, like mysite.kom and www.mysite.kom,
should be counted as 1 (not 2).

2) Have group_type in the range of 4..6

3) Now, any of these urls with group_type 4..6, which appear in the list of those with group_type lower than 4 - should be ignored and not counted at all.

The SQL code:

SELECT COUNT(DISTINCT TRIM(LEADING 'www.' FROM b.url))
FROM logs a
INNER JOIN urls b
ON a.url_id = b.id
WHERE (group_type BETWEEN 4 AND 6) 

----- and this condition below -----

AND TRIM(LEADING 'www.' FROM b.url)
NOT IN (
  SELECT TRIM(LEADING 'www.' FROM b.url)
  FROM logs a
  INNER JOIN urls b
  ON a.url_id = b.id
  WHERE (group_type < 4)
)

If my sql query is correct, can it be optimized (to look more compact)?

3
  • Can't you just run your query to see if it's correct? Commented Apr 3, 2012 at 3:00
  • 1
    You might also want to try this question on Code Review. Commented Apr 3, 2012 at 7:00
  • @Michael Mior Yes, I tested it and it seems to be correct. Commented Apr 3, 2012 at 11:32

2 Answers 2

3
SELECT COUNT(DISTINCT u.id) AS COUNT_QUES FROM urls u
INNER JOIN logs l
ON u.id=l.url_id
WHERE u.url NOT IN (SELECT A.url FROM 
(SELECT * FROM urls u
WHERE SUBSTR(u.url,1,3)!='www')A,
(SELECT * FROM urls v
WHERE SUBSTR(v.url,1,3)='www')B
WHERE A.url=SUBSTR(B.url,5,LENGTH(B.url))
) 
AND l.group_type BETWEEN 4 AND 6
AND u.id NOT IN
(SELECT DISTINCT u.id FROM urls u
INNER JOIN logs l
ON u.id=l.url_id
WHERE u.url NOT IN (SELECT A.url FROM 
(SELECT * FROM urls u
WHERE SUBSTR(u.url,1,3)!='www')A,
(SELECT * FROM urls v
WHERE SUBSTR(v.url,1,3)='www')B
WHERE A.url=SUBSTR(B.url,5,LENGTH(B.url))
) 
AND l.group_type < 4
)

OR

SELECT COUNT(DISTINCT CASE WHEN B.URL_ID IS NOT NULL AND FLAG1 = 1 AND FLAG2 = 0 THEN TRIM(LEADING 'www.' FROM A.URL) END)
FROM URLS A 
LEFT JOIN (SELECT URL_ID,
           MAX(CASE WHEN GROUP_TYPE BETWEEN 4 AND 6 THEN 1 ELSE 0 END) FLAG1,
           MAX(CASE WHEN GROUP_TYPE < 4 THEN 1 ELSE 0 END) FLAG2
           FROM LOGS
           GROUP BY URL_ID) B
           ON A.ID = B.URL_ID

Hope this works for you.Check this on SQLFIDDLE - http://sqlfiddle.com/#!2/1fde2/39

Sign up to request clarification or add additional context in comments.

2 Comments

Note that you can replace MAX(CASE WHEN GROUP_TYPE BETWEEN 4 AND 6 THEN 1 ELSE 0 END) with MAX(GROUP_TYPE BETWEEN 4 and 6).
Yep.. the case statement can be removed.
1

Here's one way:

SELECT trimmed_url
  FROM ( SELECT TRIM(LEADING 'www.' FROM urls.url) AS trimmed_url,
                MIN(logs.group_type) AS min_group_type
           FROM logs
           JOIN urls
             ON urls.id = logs.url_id
          GROUP
             BY trimmed_url
       ) t
 WHERE min_group_type BETWEEN 4 AND 6
;

But only you can judge whether it looks more compact to you, and only testing can determine whether it performs better.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.