0

In my application the users can create campaigns for sending messages. When the campaign tries to send a message, one of the three things can happen:

  1. The message is suppressed and not let through
  2. The message can't reach the recipient and is considered failed
  3. The message is successfully delivered

To keep track of this, I have the following table:

processed_messages table structure

My problem is that when the application has processed a lot of messages (more than 10 million), the query I use for showing campaign statistics for the user slows down by a considerable margin (~ 15 seconds), even when there are only a few (~ 10) campaigns being displayed for the user.

Here is the query I'm using:

select `campaigns`.*, (select count(*) from `processed_messages` 
where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'sent') as `messages_sent`, 
(select count(*) from `processed_messages` where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'failed') as `messages_failed`, 
(select count(*) from `processed_messages` where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'supressed') as `messages_supressed` 
from `campaigns` where `user_id` = 1 and `campaigns`.`deleted_at` is null order by `updated_at` desc;

So my question is: how can I make this query run faster? I believe there should be some way of not having to use sub-queries multiple times but I am not very experienced with MySQL syntax yet.

1 Answer 1

3

You should write this as a single join, using conditional aggregation:

SELECT
    c.*,
    COUNT(CASE WHEN pm.status = 'sent' THEN 1 END) AS messages_sent,
    COUNT(CASE WHEN pm.status = 'failed' THEN 1 END) AS messages_failed,
    COUNT(CASE WHEN pm.status = 'suppressed' THEN 1 END) AS messages_suppressed
FROM campaigns c
LEFT JOIN processed_messages pm
    ON c.id = pm.campaign_id
WHERE
    c.user_id = 1 AND
    c.deleted_at IS NULL
GROUP BY
    c.id
ORDER BY
    c.updated_at DESC;

It should be noted that at first glance, doing SELECT c.* appears to be a violation of the GROUP BY rules which say that only columns which appear in the GROUP BY clause can be selected. However, assuming that campaigns.id is the primary key column, then there is nothing wrong with selecting all columns from this table, provided that we aggregate by the primary key.

Edit:

If the above answer does not run on your MySQL server version, with an error message complaining about only full group by, then use this version:

SELECT c1.*, c2.messages_sent, c2.messages_failed, c2.message_suppressed
FROM campaigns c1
INNER JOIN
(
    SELECT
        c.id
        COUNT(CASE WHEN pm.status = 'sent' THEN 1 END) AS messages_sent,
        COUNT(CASE WHEN pm.status = 'failed' THEN 1 END) AS messages_failed,
        COUNT(CASE WHEN pm.status = 'suppressed' THEN 1 END) AS messages_suppressed
    FROM campaigns c
    LEFT JOIN processed_messages pm
        ON c.id = pm.campaign_id
    WHERE
        c.user_id = 1 AND
        c.deleted_at IS NULL
    GROUP BY
        c.id
) c2
    ON c1.id = c2.id
ORDER BY
    c2.updated_at DESC;
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the quick answer. I am currently testing the querying speed and it seems very promising and very fast when there are campaigns with no messages processed at all (improved from ~15 seconds to 0.0017). The performance improvement does not seem quite as large when there are multiple campaigns with messages but I doubt it would be worse than the existing method even in that case. I will accept the answer after running a couple of tests.
For some reason, it seems that my original query is about twice as fast when tested with ~520 000 messages split among 100 campaigns. However it does not seem to scale quite so well.
@Nonetallt I have been starting at my answer for a while, and adding an index to the campaigns table might be a possibility. But, the thing is, because you want to select every column from that table, MySQL might not use the index unless it covered every column. Such an index could be a storage and performance hog, despite that it would speed up selects.
It seems that somehow the speed in my application is still better with your answer, even though phpMyAdmin shows the query as being slower. However this does not seem to work on my production server which actually uses MariaDB. I get the following error message: Syntax error or access violation: 1055 'database_.c.name' isn't in GROUP BY. Do you have an idea of how to make this answer compatible with MariaDB? (even though I tagged the original question with MySQL)
There are two ways to fix that. Either you have to modify the query such that my answer appears in a subquery, to which you join the campaigns table again, or you can turn off only full group by mode in MariaDB. There isn't anything evil with turning it off, at least not in the case of my answer, which is actually ANSI compliant.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.