1

I have a question about a query I'm writing to solve a problem from LeetCode. Here's the problem:

Ads

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| ad_id         | int     |
| user_id       | int     |
| action        | enum    |
+---------------+---------+

(ad_id, user_id) is the primary key for this table.

Each row of this table contains the ID of an Ad, the ID of a user and the action taken by this user regarding this Ad. The action column is an ENUM type of ('Clicked', 'Viewed', 'Ignored').

A company is running Ads and wants to calculate the performance of each Ad.

Performance of the Ad is measured using Click-Through Rate (CTR) where:

CTR = { 0 if no ad clicks, Ad clicks / (Ad clicks + Ad views) otherwise

Write an SQL query to find the ctr of each Ad.

Round ctr to 2 decimal points. Order the result table by ctr in descending order and by ad_id in ascending order in case of a tie.

The query result format is in the following example:

Ads table:

+-------+---------+---------+
| ad_id | user_id | action  |
+-------+---------+---------+
| 1     | 1       | Clicked |
| 2     | 2       | Clicked |
| 3     | 3       | Viewed  |
| 5     | 5       | Ignored |
| 1     | 7       | Ignored |
| 2     | 7       | Viewed  |
| 3     | 5       | Clicked |
| 1     | 4       | Viewed  |
| 2     | 11      | Viewed  |
| 1     | 2       | Clicked |
+-------+---------+---------+

Here's a fiddle with the sample data and my attempted solution. Attempted solution reproduced below:

SELECT DISTINCT t.ad_id, ROUND(
    IF(
        COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) = 0, 
        0,
      COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) * 100 / ( COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) + COUNT(v.ad_id) OVER (PARTITION BY t.ad_id) )
    ), 2) as ctr
FROM Ads as t
LEFT JOIN Ads as c ON c.ad_id=t.ad_id AND c.user_id=t.user_id AND c.action='Clicked'
LEFT JOIN Ads as v ON v.ad_id=t.ad_id AND v.user_id=t.user_id AND v.action='Viewed'
GROUP BY t.ad_id, c.ad_id, v.ad_id
ORDER BY ctr DESC, t.ad_id

Result from this query:

ad_id   ctr
1   50.00
2   50.00
3   50.00
5   0.00

The correct result should show:

ad_id ctr
1, 66.67
3, 50.00
2, 33.33
5, 0.00

From looking at the sample data, my guess is that COUNT() is not in fact partitioning by t.ad_id as I'd expect. The 50% CTR results can be explained by my CTR calc counting all 'Clicked' and all 'Viewed' instances in its calculation. (On the other hand, removing the OVER statements inside the CTR calculation - JUST the calculation, not the condition - doesn't produce the results above, as my hypothesis would suggest. So I'm not sure about this.)

Is there something wrong with the way I'm using OVER? Is my logic flawed here?

Also, I have a bonus question: I'm choosing to use a JOIN here because I'm assuming that JOIN is faster than using subqueries. Is this a fair assumption? I'm studying for a Data Analyst 1 interview - do you think the interviewer will even care if I use a JOIN vs subquery?

Edit: Thanks to forpas' explanations, I was able to come up with a much simpler solution than my original. I think forpas' solution in his answer below may still be preferable since it explicitly deals with NULLs in the table.

SELECT ad_id, ROUND(IF(
    SUM(action='Clicked') = 0,
    0,
    SUM(action='Clicked') * 100 / ( SUM(action='Clicked') + SUM(action='Viewed'))
), 2) as ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id

2 Answers 2

2

You can do it with conditional aggregation:

SELECT ad_id,
  ROUND(100 * COALESCE(SUM(action = 'Clicked') / SUM(action IN ('Clicked', 'Viewed')), 0), 2) ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id;

You could get the same results with SUM() window function, but I don't think that this is better for performance or readability:

SELECT DISTINCT ad_id,
  ROUND(
    100 * 
    COALESCE(
      SUM(action = 'Clicked') OVER (PARTITION BY ad_id) / 
      SUM(action IN ('Clicked', 'Viewed')) OVER (PARTITION BY ad_id)
      , 0
    )
    , 2
  ) ctr
FROM Ads
ORDER BY ctr DESC, ad_id;

See the demo.
Results:

> ad_id |   ctr
> ----: | ----:
>     1 | 66.67
>     3 | 50.00
>     2 | 33.33
>     5 |  0.00
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the answer! That's good to know about the COALESCE function, I'll check that out. Do you have any ideas as to why the CTR expression as I wrote it doesn't work as I'd expect?
You are overcomplicating things. Check here: dbfiddle.uk/… what is the resultset of your joins.
Thank for the link. I recognize that my solution is perhaps not the simplest. I'd really like to know how to make it work mostly because I see a scenario in which understanding why my method doesn't work could really help me out in an interview scenario. Mostly if I make a similar mistake on a similar question. I'd really appreciate insight into why my specific syntax doesn't result in the correct calculation. Do you have any ideas here?
The problem with your query is that you use GROUP BY. If you remove it you will get the correct results: dbfiddle.uk/… But as I said this is not the way to go.
Huh, that fixes it. Do you know why GROUP BY would change the results that way? Is GROUP BY changing the content of the table? I thought it just changed the order of the rows. I assumed that this would not affect COUNT() in the JOINed table
|
0
SELECT t1.ad_id,
(CASE 
 WHEN t2.clicked/t1.total IS NULL THEN 0 
 ELSE round((t2.clicked/t1.total)*100,2) END) as ctr 
FROM
(SELECT ad_id,SUM(CASE WHEN action IN ('Viewed','Clicked') THEN 1 ELSE 0 END) as total
FROM Ads
GROUP BY 1
)t1
LEFT JOIN 
(SELECT ad_id,SUM(CASE WHEN action IN('Clicked') THEN 1 ELSE 0 END) as clicked
FROM Ads
GROUP BY 1)t2
ON t1.ad_id = t2.ad_id
ORDER BY 2 DESC, ad_id;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.