2

So I want to compare new users to returning users in a table by month. I have a table that contains each action with a username and a date stamp.

I can easily pull users that performed an action in, for example, January 2011. To see see if each user is new I need to then run their username against all previous records (prior to January 2011).

In my fumblings I came up with the following:

  SELECT ini.username,
         MIN(ini.datetime) AS firstAction,
         COUNT(ini.datetime) AS numMonth,
         (SELECT COUNT(*) 
            FROM tableActions tot
           WHERE tot.username = ini.username
             AND tot.datetime < '201101%' 
             AND tot.datetime > '201001%') AS numTotal
    FROM tableActions ini
   WHERE DATETIME >= '201101%' 
     AND DATETIME < '201102%'
GROUP BY ini.username
ORDER BY firstAction

It doesn't error, but it doesn't finish either. Seems to be quite intense.

6
  • What's your question? How to "fix" it? State your requirements. Commented Oct 14, 2011 at 13:26
  • Data type of the datetime column is... varchar? bad idea. Your query should be slow. If data type of the datetime column is datetime then I fail to understand what kind of comparison is >= '201101%'? Commented Oct 14, 2011 at 13:35
  • Agreed, what is the column declaration for "datetime" ? Commented Oct 14, 2011 at 14:06
  • The datetime column is varchar, I can fix that though, thanks for pointing that out. Commented Oct 14, 2011 at 14:45
  • @TomalakGeret'kal the questions is how can first query for all actions in January 2011, and then compare that number to all previous actions. Basically, I'm looking for new users performing actions in January by eliminating anyone who has performed an action previously. Hope that's better. Commented Oct 14, 2011 at 14:49

4 Answers 4

5

You can re-write the query to be (assuming tableactions.datetime is a DATETIME data type):

   SELECT ini.username,
          MIN(ini.datetime) AS firstAction,
          COUNT(ini.datetime) AS numMonth,
          x.numTotal
     FROM tableActions ini
LEFT JOIN (SELECT tot.username,
                  COUNT(*) AS numTotal
             FROM tableActions tot
            WHERE tot.datetime > '2010-01-01'
              AND tot.datetime < '2011-01-01'
         GROUP BY tot.username) x ON x.username = ini.username
    WHERE ini.datetime BETWEEN '2011-01-01' AND '2011-01-31'
 GROUP BY ini.username
 ORDER BY firstAction

Might help to have an index on username at a minimum, though a covering index using username, datetime is worth considering.

The datetime comparison looks suspect - LIKE is the only to support wildcards.

Sign up to request clarification or add additional context in comments.

4 Comments

had the same thought - count(datetime) ?
datetime is varchar. Should I work to fix that first and then try running the query? Thank you.
Can't say if the query will work as-is but the idea is right. BTW you can use COUNT(1) instead of COUNT(*).
@Tim Cutting: Yes - it's not trivial to change things when you have data but proper data typing will make finding things easier/perform better.
1

I think a simple table-to-itself join with a suitable where clause will be sufficient (this query is straight from my head, not tested):

SELECT    curr_activity.username, COUNT(prev_activity.username) AS did_something_in_the_past
FROM      tableActions AS curr_activity
LEFT JOIN tableActions AS prev_activity ON curr_activity.username = prev_activity.username 
WHERE     curr_activity.datetime >= '2011-01-01' AND curr_activity.datetime < '2011-02-01'
AND       prev_activity.datetime <  '2011-01-01' 
GROUP BY  curr_activity.username

Indexes do matter. You must index the username and datetime column, and the datetime column must be a datetime or a similar data type.

Comments

0
SELECT username,
MIN(datetime) AS firstAction,
MAX(datetime) AS numMonth,
COUNT(*) AS numTotal
GROUP BY ini.username
HAVING numTotal > 1 
WHERE DATETIME between '201001%' AND '201102%'
ORDER BY username
* I think this collapsed version is what you need ?  

1 Comment

There's different date criteria - might be a typo, otherwise I agree
0

I think you can replace

SELECT COUNT(*) 
        FROM tableActions tot
       WHERE tot.username = ini.username
         AND tot.datetime < '201101%' 
         AND tot.datetime > '201001%'

with

SELECT 1
        FROM tableActions tot
       WHERE tot.username = ini.username
         AND tot.datetime < '201101%' 
         AND tot.datetime > '201001%' LIMIT 1

, so it does not have to loop through all the records and count them.

1 Comment

Yes, TOP is for SQL Server, so you need to use LIMIT 1 in the end of you the sub-query instead. I am not a big expert on MySql, so just assume it will work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.