I have two tables.
One contains a list of products with the primary key being a product ID. Let's pretend that the ~10 columns of product info have been condensed into one.
The other contains a list of scores that users give for products. The columns are the product ID, user ID, and the rating score. There could be an order of magnitude more entries in this table than the products table.
I want to get, in a single query, all the info for a product, as well as its average user rating and number of user ratings.
This seems like one right way to do it:
SELECT
p.p_id,
p.product_info,
( SELECT AVG(score) FROM ratings AS r WHERE r.p_id = p.p_id ) avg_rating,
( SELECT COUNT(score) FROM ratings AS r WHERE r.p_id = p.p_id ) num_ratings
FROM products AS p
Real question: How does this look from a performance perspective as my database scales up? Can this use less subqueries and maybe substitute them with joins?
Side question: I used to have a plan where I would cache the average rating and number of ratings for each product in the products table and update that whenever a new score or updated score arrives. This makes the query super simple, but my gut tells me that this is really naive. Assuming that this is a InnoDB table, can someone explain more definitively why this sort of caching may or may not be a good idea?