MySQL subqueries with math operations

Question

I have two tables.

One contains a list of products with the primary key being a product ID. Let's pretend that the ~10 columns of product info have been condensed into one.

The other contains a list of scores that users give for products. The columns are the product ID, user ID, and the rating score. There could be an order of magnitude more entries in this table than the products table.

I want to get, in a single query, all the info for a product, as well as its average user rating and number of user ratings.

This seems like one right way to do it:

SELECT 
    p.p_id,
    p.product_info,
    ( SELECT AVG(score) FROM ratings AS r WHERE r.p_id = p.p_id ) avg_rating,
    ( SELECT COUNT(score) FROM ratings AS r WHERE r.p_id = p.p_id ) num_ratings
    FROM products AS p

Real question: How does this look from a performance perspective as my database scales up? Can this use less subqueries and maybe substitute them with joins?

Side question: I used to have a plan where I would cache the average rating and number of ratings for each product in the products table and update that whenever a new score or updated score arrives. This makes the query super simple, but my gut tells me that this is really naive. Assuming that this is a InnoDB table, can someone explain more definitively why this sort of caching may or may not be a good idea?

newtover · Accepted Answer · 2012-01-26 10:52:40Z

1

If product_info is a rather long VARCHAR, the following query might be faster (assuming you have a composite index (p_id, score) on ratings and p_id is indexed in products):

SELECT 
  p_id,
  product_info,
  avg_rating,
  num_ratings
FROM (
  SELECT p_id, AVG(score) as avg_rating, COUNT(score) as num_ratings
  FROM ratings
  GROUP BY p_id
) as aggr
JOIN products USING (p_id);

The order of join reflects the order in which MySQL would prefer to execute the query (since the result of the subquery is not indexed).

But the query works well when the ratings contains at least a singe record for each product, otherwise you will need to add a UNION ALL with zeros for the rest of products (which might make it significantly slower).

The solution with precalculated aggregates becomes a good idea when the first query is not fast enough.

edited Jan 26, 2012 at 10:52

answered Jan 26, 2012 at 10:46

newtover

32.2k11 gold badges89 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

fixception Over a year ago

A shame that I can't accept all of your answers. I'll pick the most comprehensive one. Thanks all!

ypercubeᵀᴹ Over a year ago

You don't need UNION ALL. An aggr RIGHT JOIN products would work fine, too.

newtover Over a year ago

@ypercube, if you look at EXPLAIN EXTENDED of the the RIGHT JOIN, you will see that MySQL rewrites the query as LEFT JOIN with the first table being the second. That is the same joining of the products to unindexed result of a subquery that has the same number of rows as the products table. Exactly what I would prefer to avoid.

ypercubeᵀᴹ Over a year ago

@newtover: Ah, you mean that (a JOIN Grouped_b) UNION ALL (a WHERE product NOT IN B) is more efficient than a LEFT JOIN Grouped_b. I hadn't thought of that, I'll check it out.

aF. · Accepted Answer · 2012-01-26 10:27:19Z

1

You can use a single left join and that will imply only one table scan. With the different selects woulb imply more!

SELECT 
    p.p_id,
    p.product_info,
    AVG(r.score) AS avg_rating,
    COUNT(r.score) AS num_ratings
    FROM products AS p
    LEFT JOIN ratings AS r on r.p_id = p.p_id
    GROUP BY p.p_id

answered Jan 26, 2012 at 10:27

aF.

67k45 gold badges141 silver badges201 bronze badges

Comments

John Woo · Accepted Answer · 2012-01-26 10:31:22Z

1

SELECT    products.p_id,
          products.product_info,
          AVG(ratings.score) AS AverageRating,
          COUNT(ratings.score) AS xRatings
FROM products LEFT JOIN ratings ON
          ratings.p_id = products.p_id
GROUP BY products.p_id

answered Jan 26, 2012 at 10:31

John Woo

265k70 gold badges509 silver badges500 bronze badges

Comments

ypercubeᵀᴹ · Accepted Answer · 2012-01-26 10:32:36Z

1

You can use JOIN instead of these subqueries:

SELECT 
    p.p_id,
    p.product_info,
    AVG(r.score) AS avg_rating,
    COUNT(r.p_id) AS num_ratings
FROM products AS p
    LEFT JOIN rating r
        ON r.p_id = p.p_id
GROUP BY p.p_id

or one group subquery and then join:

SELECT 
    p.p_id,
    p.product_info,
    gr.avg_rating,
    COALESCE(gr.num_ratings, 0) AS num_ratings
FROM products AS p
    LEFT JOIN 
        ( SELECT 
              p_id,
              AVG(score) AS avg_rating,
              COUNT(*) AS num_ratings
          FROM rating
          GROUP BY p_id
        ) AS gr
        ON gr.p_id = p.p_id

edited Jan 26, 2012 at 10:32

answered Jan 26, 2012 at 10:26

ypercubeᵀᴹ

116k19 gold badges181 silver badges249 bronze badges

Comments

aF. · Accepted Answer · 2012-01-26 10:37:23Z

1

Try this:

SELECT 
    p.p_id,
    ,   p.product_info
    ,   AVG(r.score) avg_rating
    ,   COUNT(r.score) num_ratings
FROM 
    products AS p
    inner join ratings AS r on r.p_id = p.p_id
group by
    p.p_id,
    ,   p.product_info

edited Jan 26, 2012 at 10:37

aF.

67k45 gold badges141 silver badges201 bronze badges

answered Jan 26, 2012 at 10:32

devers

311 bronze badge

Comments

ypercubeᵀᴹ · Accepted Answer · 2012-01-26 10:41:16Z

1

You can use join.

SELECT 
    p.p_id
    p.product_info
    AVG(s.score) as avg_rating,
    COUNT(s.score) as num_ratings
LEFT JOIN 
    ratings s
ON 
    p.p_id = s.p_id
GROUP BY 
    p.p_id

edited Jan 26, 2012 at 10:41

ypercubeᵀᴹ

116k19 gold badges181 silver badges249 bronze badges

answered Jan 26, 2012 at 10:26

xdazz

161k38 gold badges255 silver badges278 bronze badges

3 Comments

aF. Over a year ago

on the GROUP BY use p.p_id instead of p.pid.

xdazz Over a year ago

@aF. Thanks, it's a typo, fixed.

aF. Over a year ago

sometimes people doesn't like when we edit their posts so I said it ^^

Collectives™ on Stack Overflow

MySQL subqueries with math operations

6 Answers 6

4 Comments

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related