I have a question about a query I'm writing to solve a problem from LeetCode. Here's the problem:
Ads
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| ad_id | int |
| user_id | int |
| action | enum |
+---------------+---------+
(ad_id, user_id) is the primary key for this table.
Each row of this table contains the ID of an Ad, the ID of a user and the action taken by this user regarding this Ad. The action column is an ENUM type of ('Clicked', 'Viewed', 'Ignored').
A company is running Ads and wants to calculate the performance of each Ad.
Performance of the Ad is measured using Click-Through Rate (CTR) where:
CTR = { 0 if no ad clicks, Ad clicks / (Ad clicks + Ad views) otherwise
Write an SQL query to find the ctr of each Ad.
Round ctr to 2 decimal points. Order the result table by ctr in descending order and by ad_id in ascending order in case of a tie.
The query result format is in the following example:
Ads table:
+-------+---------+---------+
| ad_id | user_id | action |
+-------+---------+---------+
| 1 | 1 | Clicked |
| 2 | 2 | Clicked |
| 3 | 3 | Viewed |
| 5 | 5 | Ignored |
| 1 | 7 | Ignored |
| 2 | 7 | Viewed |
| 3 | 5 | Clicked |
| 1 | 4 | Viewed |
| 2 | 11 | Viewed |
| 1 | 2 | Clicked |
+-------+---------+---------+
Here's a fiddle with the sample data and my attempted solution. Attempted solution reproduced below:
SELECT DISTINCT t.ad_id, ROUND(
IF(
COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) = 0,
0,
COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) * 100 / ( COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) + COUNT(v.ad_id) OVER (PARTITION BY t.ad_id) )
), 2) as ctr
FROM Ads as t
LEFT JOIN Ads as c ON c.ad_id=t.ad_id AND c.user_id=t.user_id AND c.action='Clicked'
LEFT JOIN Ads as v ON v.ad_id=t.ad_id AND v.user_id=t.user_id AND v.action='Viewed'
GROUP BY t.ad_id, c.ad_id, v.ad_id
ORDER BY ctr DESC, t.ad_id
Result from this query:
ad_id ctr
1 50.00
2 50.00
3 50.00
5 0.00
The correct result should show:
ad_id ctr
1, 66.67
3, 50.00
2, 33.33
5, 0.00
From looking at the sample data, my guess is that COUNT() is not in fact partitioning by t.ad_id as I'd expect. The 50% CTR results can be explained by my CTR calc counting all 'Clicked' and all 'Viewed' instances in its calculation. (On the other hand, removing the OVER statements inside the CTR calculation - JUST the calculation, not the condition - doesn't produce the results above, as my hypothesis would suggest. So I'm not sure about this.)
Is there something wrong with the way I'm using OVER? Is my logic flawed here?
Also, I have a bonus question: I'm choosing to use a JOIN here because I'm assuming that JOIN is faster than using subqueries. Is this a fair assumption? I'm studying for a Data Analyst 1 interview - do you think the interviewer will even care if I use a JOIN vs subquery?
Edit: Thanks to forpas' explanations, I was able to come up with a much simpler solution than my original. I think forpas' solution in his answer below may still be preferable since it explicitly deals with NULLs in the table.
SELECT ad_id, ROUND(IF(
SUM(action='Clicked') = 0,
0,
SUM(action='Clicked') * 100 / ( SUM(action='Clicked') + SUM(action='Viewed'))
), 2) as ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id