Improving the performance of sql joined count query

Question

In my application the users can create campaigns for sending messages. When the campaign tries to send a message, one of the three things can happen:

The message is suppressed and not let through
The message can't reach the recipient and is considered failed
The message is successfully delivered

To keep track of this, I have the following table:

My problem is that when the application has processed a lot of messages (more than 10 million), the query I use for showing campaign statistics for the user slows down by a considerable margin (~ 15 seconds), even when there are only a few (~ 10) campaigns being displayed for the user.

Here is the query I'm using:

select `campaigns`.*, (select count(*) from `processed_messages` 
where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'sent') as `messages_sent`, 
(select count(*) from `processed_messages` where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'failed') as `messages_failed`, 
(select count(*) from `processed_messages` where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'supressed') as `messages_supressed` 
from `campaigns` where `user_id` = 1 and `campaigns`.`deleted_at` is null order by `updated_at` desc;

So my question is: how can I make this query run faster? I believe there should be some way of not having to use sub-queries multiple times but I am not very experienced with MySQL syntax yet.

Tim Biegeleisen · Accepted Answer · 2019-04-17 14:38:58Z

3

You should write this as a single join, using conditional aggregation:

SELECT
    c.*,
    COUNT(CASE WHEN pm.status = 'sent' THEN 1 END) AS messages_sent,
    COUNT(CASE WHEN pm.status = 'failed' THEN 1 END) AS messages_failed,
    COUNT(CASE WHEN pm.status = 'suppressed' THEN 1 END) AS messages_suppressed
FROM campaigns c
LEFT JOIN processed_messages pm
    ON c.id = pm.campaign_id
WHERE
    c.user_id = 1 AND
    c.deleted_at IS NULL
GROUP BY
    c.id
ORDER BY
    c.updated_at DESC;

It should be noted that at first glance, doing SELECT c.* appears to be a violation of the GROUP BY rules which say that only columns which appear in the GROUP BY clause can be selected. However, assuming that campaigns.id is the primary key column, then there is nothing wrong with selecting all columns from this table, provided that we aggregate by the primary key.

Edit:

If the above answer does not run on your MySQL server version, with an error message complaining about only full group by, then use this version:

SELECT c1.*, c2.messages_sent, c2.messages_failed, c2.message_suppressed
FROM campaigns c1
INNER JOIN
(
    SELECT
        c.id
        COUNT(CASE WHEN pm.status = 'sent' THEN 1 END) AS messages_sent,
        COUNT(CASE WHEN pm.status = 'failed' THEN 1 END) AS messages_failed,
        COUNT(CASE WHEN pm.status = 'suppressed' THEN 1 END) AS messages_suppressed
    FROM campaigns c
    LEFT JOIN processed_messages pm
        ON c.id = pm.campaign_id
    WHERE
        c.user_id = 1 AND
        c.deleted_at IS NULL
    GROUP BY
        c.id
) c2
    ON c1.id = c2.id
ORDER BY
    c2.updated_at DESC;

edited Apr 17, 2019 at 14:38

answered Apr 17, 2019 at 13:49

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Nonetallt Over a year ago

Thanks for the quick answer. I am currently testing the querying speed and it seems very promising and very fast when there are campaigns with no messages processed at all (improved from ~15 seconds to 0.0017). The performance improvement does not seem quite as large when there are multiple campaigns with messages but I doubt it would be worse than the existing method even in that case. I will accept the answer after running a couple of tests.

Nonetallt Over a year ago

For some reason, it seems that my original query is about twice as fast when tested with ~520 000 messages split among 100 campaigns. However it does not seem to scale quite so well.

Tim Biegeleisen Over a year ago

@Nonetallt I have been starting at my answer for a while, and adding an index to the campaigns table might be a possibility. But, the thing is, because you want to select every column from that table, MySQL might not use the index unless it covered every column. Such an index could be a storage and performance hog, despite that it would speed up selects.

Nonetallt Over a year ago

It seems that somehow the speed in my application is still better with your answer, even though phpMyAdmin shows the query as being slower. However this does not seem to work on my production server which actually uses MariaDB. I get the following error message: Syntax error or access violation: 1055 'database_.c.name' isn't in GROUP BY. Do you have an idea of how to make this answer compatible with MariaDB? (even though I tagged the original question with MySQL)

Tim Biegeleisen Over a year ago

There are two ways to fix that. Either you have to modify the query such that my answer appears in a subquery, to which you join the campaigns table again, or you can turn off only full group by mode in MariaDB. There isn't anything evil with turning it off, at least not in the case of my answer, which is actually ANSI compliant.

|

Collectives™ on Stack Overflow

Improving the performance of sql joined count query

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related