EDIT: After looking at some of the answers here and hours of research, my team came to the conclusion there was most likely no way to optimize this further than the 4.5 seconds we were able to achieve (unless maybe with partitioning on offers_clicks, but that would have some ugly side-effects). Eventually, after lots of brainstorming, we decided to split both queries, create two sets of user ids (one from users table and one from offers_clicks), and compare them with set in Python. The set of ids from users table is still pulled from SQL, but we decided to move offers_clicks to Lucene and also added some caching on top of it, so that's where the other set of ids is now pulled from. The end result is that its down to about half a second with cache and 0.9s without cache.
Start of original post: I have trouble getting a query optimized. The first version of the query is fine, but the moment offers_clicks is joined in the 2nd query, the query becomes rather slow. Users table contains 10 million rows, offers_clicks contains 53 million rows.
Acceptable performance:
SELECT count(distinct(users.id)) AS count_1
FROM users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26';
1 row in set (0.35 sec)
Bad:
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks USE index (user_id_3), users USE index (country_2)
WHERE users.country = 'US'
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (7.39 sec)
Here's how it looks without specificying any indexes (even worse):
SELECT count(distinct(users.id)) AS count_1
FROM offers_clicks, users
WHERE users.country IN ('US')
AND users.last_active > '2015-02-26'
AND offers_clicks.user_id = users.id
AND offers_clicks.date > '2015-02-14'
AND offers_clicks.ranking_score < 3.49
AND offers_clicks.ranking_score > 0.24;
1 row in set (17.72 sec)
Explain:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+
Explain without specifying any indexes:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24;
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
| 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index |
| 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where |
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
Here's a whole bunch of indexes I tried with not too much success:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | |
| offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | |
| users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
Simplified users schema:
+---------------------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------------+---------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| country | char(2) | NO | MUL | | |
| last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
Simplified offers clicks schema:
+-----------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | 0 | |
| offer_id | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| ranking_score | decimal(5,2) | NO | MUL | 0.00 | |
distinct(user.id)is better asdistinct user.idbecause "distinct is not a function"