How to optimize a SQL query using multiple tables

Question

I have this SQL query here that grabs the 5 latest news posts. I want to make it so it also grabs the total likes and total news comments in the same query. But the query I made seems to be a little slow when working with large amounts of data so I am trying to see if I can find a better solution. Here it is below:

SELECT *, 
`id` as `newscode`, 
(SELECT COUNT(*) FROM `likes` WHERE `type`="newspost" AND `code`=`newscode`) as `total_likes`,
(SELECT COUNT(*) FROM `news_comments` WHERE `post_id`=`newscode`) as `total_comments`
FROM `news` ORDER BY `id` DESC LIMIT 5

Here is a SQLFiddle as well: http://sqlfiddle.com/#!2/d3ecbf/1

MonkeyZeus · Accepted Answer · 2014-02-17 22:50:26Z

1

I would recommend adding a total_likes and total_comments fields to the news table which gets incremented/decremented whenever a like and/or comment is added or removed.

Your likes and news_comments tables should be used for historical purposes only.

This strenuous counting should not be performed every time a page is loaded because that is a complete waste of resources.

answered Feb 17, 2014 at 22:50

MonkeyZeus

20.8k4 gold badges41 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DRapp Over a year ago

@user3205106, Where I was going too, but it should be on an INSERT trigger into the likes / comments tables. This is a common issue, especially for web to DENORMALIZE for counts that would otherwise bring queries to a crawl.

MonkeyZeus Over a year ago

Thank you @DRapp I assume you simply forgot to mention it should be upon DELETE as well as INSERT because generally speaking users can usually perform both a Like and Un-Like operation unless OP was to lock them into their decision upon liking something. The same rule can apply to comments as well considering the fact that comments are likely to be moderated. Hence the increment/decrement portion of my answer.

DRapp Over a year ago

correct, it should be on insert, update and delete, but the triggers would still be the better consideration for denormalization of totals.

GarethD · Accepted Answer · 2014-02-17 23:44:56Z

0

You could rewrite this using joins, MySQL has known issues with subqueries, especially when dealing with large data sets:

SELECT  n.*, 
        `id` as `newscode`, 
        COALESCE(l.TotalLikes, 0) AS `total_likes`,
        COALESCE(c.TotalComments, 0) AS `total_comments`
FROM    `news` n
        LEFT JOIN
        (   SELECT  Code, COUNT(*) AS TotalLikes
            FROM    `likes` 
            WHERE   `type` = "newspost" 
            GROUP BY Code
        ) AS l
            ON l.`code` = n.`id`
        LEFT JOIN
        (   SELECT  post_id, COUNT(*) AS TotalComments
            FROM    `news_comments` 
            GROUP BY post_id
        ) AS c
            ON c.`post_id` = n.`id`
ORDER BY n.`id` DESC LIMIT 5;

The reason is that when you use a join as above, MySQL will materialise the results of the subquery when it is first needed, e.g at the start of this query, mySQL will put the results of:

SELECT  post_id, COUNT(*) AS TotalComments
FROM    `news_comments` 
GROUP BY post_id

into an in memory table and hash post_id for faster lookups. Then for each row in news it only has to look up TotalComments from this hashed table, when you use a correlated subquery it will execute the query once for each row in news, which when news is large will result in a large number of executions. If the initial result set is small you may not see a performance benefit and it may be worse.

Examples on SQL Fiddle

Finally, you may want to index the relevant fields in news_comments and likes. For this particular query I think the following indexes will help:

CREATE INDEX IX_Likes_Code_Type ON Likes (Code, Type);
CREATE INDEX IX_newcomments_post_id ON news_comments (post_id);

Although you may need to split the first index into two:

CREATE INDEX IX_Likes_Code ON Likes (Code);
CREATE INDEX IX_Likes_Type ON Likes (Type);

edited Feb 17, 2014 at 23:44

answered Feb 17, 2014 at 22:34

GarethD

70k11 gold badges90 silver badges130 bronze badges

8 Comments

user3205106 Over a year ago

Im getting the error Unknown column 'n.newscode' in 'on clause' when running this.

user3205106 Over a year ago

What do you mean exactly? There separate tables.

GarethD Over a year ago

In your subquery that gets total_likes, you have this in the WHERE clause: `code`=`newscode`. What does newscode refer to here? I know they are separate tables, but presumably they are linked? Otherwise you are just getting the total number of likes, which will be the same for every row unless there is a relation.

user3205106 Over a year ago

newscode is the id of the current news post its using. It looks up the likes/comments by id. Newcode is a dynamically generated column for each result. AS show in the original query.

GarethD Over a year ago

I have substituted in N.`id` for each instance of newscode at the query now runs.

|

wumpz · Accepted Answer · 2014-02-17 22:30:55Z

0

First check for helping indexes on columns id, post_id and type,code.

answered Feb 17, 2014 at 22:30

wumpz

9,3213 gold badges33 silver badges27 bronze badges

Comments

Phoenix · Accepted Answer · 2014-02-17 22:33:53Z

0

I assume this is T-SQL, as that is what I am most familiar with.

First I would check indexes. If that looks good, then I'd check statement. Take a look at your query map to see how it's populating your result.

SQL works backward, so it starts with your last AND statement and goes from there. It'll group them all by code, and then type, and finally give you a count.

Right now, you're grabbing everything with certain codes, regardless of date. When you stated that you want the latest, I assume there is a date column somewhere.

In order to speed things up, add another AND to your WHERE and account for the date. Either last 24 hours, last week, whatever.

answered Feb 17, 2014 at 22:33

Phoenix

1,9695 gold badges20 silver badges30 bronze badges

2 Comments

GarethD Over a year ago

The assumption the question relates to T-SQL is franked by the fact that the question is tagged with MySQL. The statement that SQL works backwards so starts with the last AND statement it not true at all, it starts with the table you are selecting from, then deals with any joins (in whatever order it deems fit unless you say otherwise), and then deals with your where clause, again in any order that the optimiser deems fastest.

Phoenix Over a year ago

You're right, I didn't check the tags. Obviously, SQL starts with table aggregation FROM, but after that, data absolutely is parsed 'backward': WHERE, GROUP, HAVING, SELECT, ORDER BY. My suggestion was to refine the WHERE clause as, after the table and joins are processed, that's the next step.

Collectives™ on Stack Overflow

How to optimize a SQL query using multiple tables

4 Answers 4

3 Comments

8 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

8 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related