1

I have two tables.

One contains a list of products with the primary key being a product ID. Let's pretend that the ~10 columns of product info have been condensed into one.

The other contains a list of scores that users give for products. The columns are the product ID, user ID, and the rating score. There could be an order of magnitude more entries in this table than the products table.

I want to get, in a single query, all the info for a product, as well as its average user rating and number of user ratings.

This seems like one right way to do it:

SELECT 
    p.p_id,
    p.product_info,
    ( SELECT AVG(score) FROM ratings AS r WHERE r.p_id = p.p_id ) avg_rating,
    ( SELECT COUNT(score) FROM ratings AS r WHERE r.p_id = p.p_id ) num_ratings
    FROM products AS p

Real question: How does this look from a performance perspective as my database scales up? Can this use less subqueries and maybe substitute them with joins?

Side question: I used to have a plan where I would cache the average rating and number of ratings for each product in the products table and update that whenever a new score or updated score arrives. This makes the query super simple, but my gut tells me that this is really naive. Assuming that this is a InnoDB table, can someone explain more definitively why this sort of caching may or may not be a good idea?

6 Answers 6

1

If product_info is a rather long VARCHAR, the following query might be faster (assuming you have a composite index (p_id, score) on ratings and p_id is indexed in products):

SELECT 
  p_id,
  product_info,
  avg_rating,
  num_ratings
FROM (
  SELECT p_id, AVG(score) as avg_rating, COUNT(score) as num_ratings
  FROM ratings
  GROUP BY p_id
) as aggr
JOIN products USING (p_id);

The order of join reflects the order in which MySQL would prefer to execute the query (since the result of the subquery is not indexed).

But the query works well when the ratings contains at least a singe record for each product, otherwise you will need to add a UNION ALL with zeros for the rest of products (which might make it significantly slower).

The solution with precalculated aggregates becomes a good idea when the first query is not fast enough.

Sign up to request clarification or add additional context in comments.

4 Comments

A shame that I can't accept all of your answers. I'll pick the most comprehensive one. Thanks all!
You don't need UNION ALL. An aggr RIGHT JOIN products would work fine, too.
@ypercube, if you look at EXPLAIN EXTENDED of the the RIGHT JOIN, you will see that MySQL rewrites the query as LEFT JOIN with the first table being the second. That is the same joining of the products to unindexed result of a subquery that has the same number of rows as the products table. Exactly what I would prefer to avoid.
@newtover: Ah, you mean that (a JOIN Grouped_b) UNION ALL (a WHERE product NOT IN B) is more efficient than a LEFT JOIN Grouped_b. I hadn't thought of that, I'll check it out.
1

You can use a single left join and that will imply only one table scan. With the different selects woulb imply more!

SELECT 
    p.p_id,
    p.product_info,
    AVG(r.score) AS avg_rating,
    COUNT(r.score) AS num_ratings
    FROM products AS p
    LEFT JOIN ratings AS r on r.p_id = p.p_id
    GROUP BY p.p_id

Comments

1
SELECT    products.p_id,
          products.product_info,
          AVG(ratings.score) AS AverageRating,
          COUNT(ratings.score) AS xRatings
FROM products LEFT JOIN ratings ON
          ratings.p_id = products.p_id
GROUP BY products.p_id

Comments

1

You can use JOIN instead of these subqueries:

SELECT 
    p.p_id,
    p.product_info,
    AVG(r.score) AS avg_rating,
    COUNT(r.p_id) AS num_ratings
FROM products AS p
    LEFT JOIN rating r
        ON r.p_id = p.p_id
GROUP BY p.p_id

or one group subquery and then join:

SELECT 
    p.p_id,
    p.product_info,
    gr.avg_rating,
    COALESCE(gr.num_ratings, 0) AS num_ratings
FROM products AS p
    LEFT JOIN 
        ( SELECT 
              p_id,
              AVG(score) AS avg_rating,
              COUNT(*) AS num_ratings
          FROM rating
          GROUP BY p_id
        ) AS gr
        ON gr.p_id = p.p_id

Comments

1

Try this:

SELECT 
    p.p_id,
    ,   p.product_info
    ,   AVG(r.score) avg_rating
    ,   COUNT(r.score) num_ratings
FROM 
    products AS p
    inner join ratings AS r on r.p_id = p.p_id
group by
    p.p_id,
    ,   p.product_info

Comments

1

You can use join.

SELECT 
    p.p_id
    p.product_info
    AVG(s.score) as avg_rating,
    COUNT(s.score) as num_ratings
LEFT JOIN 
    ratings s
ON 
    p.p_id = s.p_id
GROUP BY 
    p.p_id

3 Comments

on the GROUP BY use p.p_id instead of p.pid.
@aF. Thanks, it's a typo, fixed.
sometimes people doesn't like when we edit their posts so I said it ^^

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.