1

I'm building a query which is used for generating an index page of Items based on the selected Category, ordering by relative popularity by counting the number of Likes and number of times the item has been added to a List in the past 24 hours. The single input to the query is the main category ID.

This involves a total of 4 tables, one of which is a nested set, so it's not exactly trivial. I'm generally pretty adept at writing reasonably efficient SQL, but I'm struggling to make the JOINs work the way I want.

Categories

Since the categories are nested and items are assigned to a single category, it is necessary to first select all categories which fall underneath the specified one in the query input.

I am using the awesome_nested_set gem to make this work. It adds lft and rgt columns which can be used to select from the hierarchy without difficulty:

SELECT c2.*
FROM categories c1
JOIN categories c2
    ON c2.lft >= c1.lft AND c2.rgt <= c1.rgt
WHERE c1.id = [MAIN CATEGORY ID]

Items

Then extending the above to select out the items is rather simple:

SELECT i.*
FROM categories c1
JOIN categories c2
    ON c2.lft >= c1.lft AND c2.rgt <= c1.rgt
JOIN items i
    ON i.category_id = c2.id
WHERE c1.id = [MAIN CATEGORY ID]

Everything to this point works fine & executes quickly. The last thing to do (ignoring pagination of course) is to order them.

Popularity

Items are ordered by popularity. The way to calculate an item's popularity is:

(number of likes) + (number of times added to list) * 5

e.g. if an item had been added to 32 lists & liked 483 times, the popularity metric would be 643.

Depending on whether the user is viewing 'all time most popular' or 'trending', we might restrict the calculation of those metrics to the likes/lists which have happened in the past day.

I thought this would be relatively trivial, but it did not end up being so. There are apparently issues which arise when you use COUNT along with JOINs and I needed to use LEFT JOINs in case the item had 0 likes/lists.

The currently working code is as follows:

SELECT
    q.*,
    (q.likes + q.lists * 5) AS popularity
FROM
(
    SELECT
        i.*,
        (SELECT COUNT(*) FROM likes l WHERE i.id = l.item_id AND l.created_at > DATE_SUB(NOW(), INTERVAL 1 day)) AS likes,
        (SELECT COUNT(*) FROM list_items li WHERE i.id = li.item_id AND li.created_at > DATE_SUB(NOW(), INTERVAL 1 day)) AS lists
    FROM categories c1
    JOIN categories c2
        ON c2.lft >= c1.lft AND c2.rgt <= c1.rgt
    JOIN items i
        ON i.category_id = c2.id
    WHERE c1.id = 37
) q
ORDER BY popularity

However, this is clearly really quite horrific code. Each item needs to make two subqueries & then the entire thing needs to be wrapped just to do some arithmetic (though I would assume that wasn't too bad).

I have tried the following things, but they have not worked for various reasons:

SELECT
    i.*,
    (SELECT COUNT(*) FROM likes l WHERE i.id = l.item_id AND l.created_at > DATE_SUB(NOW(), INTERVAL 1 day)) AS likes,
    (SELECT COUNT(*) FROM list_items li WHERE i.id = li.item_id AND li.created_at > DATE_SUB(NOW(), INTERVAL 1 day)) AS lists,
    (likes + lists * 5) AS popularity

For some reason, you can't do math on other columns you're selecting.

SELECT
    i.*,
    COUNT(l.id) as likes,
    COUNT(li.id) as lists
FROM categories c1
JOIN categories c2
    ON c2.lft >= c1.lft AND c2.rgt <= c1.rgt
JOIN items i
    ON i.category_id = c2.id
LEFT JOIN likes l
    ON l.item_id = i.id
LEFT JOIN list_items li
    ON li.item_id = i.id
WHERE c1.id = 37

You only get one result for some reason. I'm don't understand the cause of this.

SELECT
    i.*,
    COUNT(l.id) as likes,
    COUNT(li.id) as lists
FROM categories c1
JOIN categories c2
    ON c2.lft >= c1.lft AND c2.rgt <= c1.rgt
JOIN items i
    ON i.category_id = c2.id
LEFT JOIN likes l
    ON l.item_id = i.id
LEFT JOIN list_items li
    ON li.item_id = i.id
WHERE c1.id = 37
GROUP BY i.id

Adding the GROUP BY makes all items return, but the likes/lists number is now completely wrong. I think it is adding them up or something.

Basically, I'm a bit stuck. The example above with subqueries does work, but I don't think it works in an ideal way. I'd like to make it work solely with JOINs, but am struggling to understand how.

Any help is much appreciated :)

1 Answer 1

2

Do a sub query grouped by the item_id to get the counts, and LEFT JOIN against those sub queries.

Something like this:-

SELECT
    q.*,
    (q.likes + q.lists * 5) AS popularity
FROM
(
    SELECT
        i.*,
        IFNULL(likes_count, 0) AS likes,
        IFNULL(lists_count, 0) AS lists
    FROM categories c1
    JOIN categories c2
        ON c2.lft >= c1.lft AND c2.rgt <= c1.rgt
    JOIN items i
        ON i.category_id = c2.id
    LEFT OUTER JOIN
    (
        SELECT item_id, COUNT(*) AS likes_count FROM likes WHERE created_at > DATE_SUB(NOW(), INTERVAL 1 day) GROUP BY item_id
    ) likes
    ON likes.item_id = i.id
    LEFT OUTER JOIN
    (
        SELECT item_id, COUNT(*) AS lists_count FROM list_items li WHERE created_at > DATE_SUB(NOW(), INTERVAL 1 day) GROUP BY item_id
    ) lists
    ON lists.item_id = i.id
    WHERE c1.id = 37
) q
ORDER BY popularity
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.