1

Need to find COUNT OF unique user accounts from table for last 5 months, this table contains millions of data.

select COUNT(distinct(account)) as total_unique,
 COUNT(distinct(IF( Column1!=0 OR Column2!=0, account, null)))
from table_name where date(event_date) >= date('2014-04-01') and date(event_date) <=date('2014-08-31');

This query currently consuming more than 10 mins to get the data. We have indexes on 'event_date' column and on 'account' column in this table. We are using mysql as DB. Could you please help us.

2 Answers 2

3

In your query indexes can't be use because of date function over the eventdate colomn

WHERE eventdate BETWEEN CAST('2014-04-01' AS DATE)
                        AND CAST('2014-09-01' AS DATE) - INTERVAL 1 SECOND

You'll get same result but now an index range scan will be used.

Or as ypercube stated in his comment

WHERE eventdate >= CAST('2014-04-01' AS DATE)
  AND eventdate < CAST('2014-09-01' AS DATE)
Sign up to request clarification or add additional context in comments.

4 Comments

Good. Even better, in my opinion: WHERE eventdate >= CAST('2014-04-01' AS DATE) AND eventdate < CAST('2014-09-01' AS DATE)
I agree, i will change it
thanks Gervs for suggestions but still performance is as is. I found that some how indexes are not getting used whereas we have indexes on columns. Please see below "explain" utilities output. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+------+---------------+------+---------+------+---------+-----------------------------+ | 1 | SIMPLE | user_db | ALL | NULL | NULL | NULL | NULL | 5764552 | Using where; Using filesort |
Also tried below where clause as per the suggestion of ypercube still having same issue - WHERE cast(event_date as date) >= CAST('2014-05-31' AS DATE) - INTERVAL 5 MONTH AND cast(event_date as date) < CAST('2014-05-31' AS DATE) and cast(event_date as DATE) < NOW() - INTERVAL 2 DAY GROUP BY MONTH(event_date);
-1

If column1 and column2 are the same for each repeated record of account, you should consider sticking the SELECT and GROUP BY of the account, column1, and column2 in derived table. This will help with the expensive task of Count and will remove the really expensive Count(Distinct())

SELECT 
    Count(a.Account) AS total_unique,
    SUM(CASE WHEN a.column1 <> 0 or a.column2 <> 0 THEN 1 ELSE 0 END)    
FROM 
   (
        SELECT account, column1, column2 
        FROM table_name GROUP BY 1,2,3
        WHERE 
             DATE (event_date) >= DATE ('2014-04-01') AND 
             DATE (event_date) <= DATE ('2014-08-31')) as a
   );

1 Comment

This will produce an error (event_date not a column of derived table a)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.