1

I have to make some statistics for my application, so I need an algorithm with a performance as best as possible. I have some several question.

I have a data structure like this in the mysql database:

user_id    group_id     date
1          5            2012-11-20
1          2            2012-11-01
1          4            2012-11-01
1          3            2012-10-15
1          9            2013-01-18
...

So I need to find the group of some user at a specific date. For example, the group of the user 1 at date 2012-11-15 (15 november 2012) should return the most recent group, which is 2 and 4 (many group at the same time) at date 2012-11-01 (the closest and smaller date).

Normally, I could do a Select where date <= chosen date order by date desc, etc... but that's not the point because if I have 1000 users, it will need 1000 requests to have all the result.

So here are some question:

  1. I have already using the php method to loop through the array to avoid the high number of mysql request, but it's still not good because the array size may be 10000+. Using a foreach (or for?) is quite costly. So my question is if given an array, ordered by date (desc or asc), what's the fastest way to find the closest index of the element which contain a date smaller (or greater) than a given date; beside using a for or foreach loop to loop through each element.
  2. If there is no solution for the first question, then what kind of data structure would you suggest for this kind of problem.

Note: the date is in mysql format, it's not converted in timestamp when you stored it in an array

EDIT: this is a sql fiddle http://sqlfiddle.com/#!2/dc28d/1 For dos_id = 6, t="2012-11-01" it should returns only 2 and 5 at date "2010-12-10 13:16:58"

5
  • 1
    Do you need to find the group of "some user at a specific date" or "all users at a specific date"? Commented Feb 4, 2013 at 9:52
  • 1
    why not depend on query to select only specific groups Commented Feb 4, 2013 at 9:52
  • Your question is unclear. So you are pulling all the data from the database and then looping in php to pick out the closest date? Doesn't seem efficient. Commented Feb 4, 2013 at 9:54
  • @lc. It's better to extract the group of all users at a specific date, but they will be stored in an array indexed by user_id for a later easy access Commented Feb 4, 2013 at 9:55
  • @Akam & bmewsing: Using solely SQL won't help because you can't get what you want. Conditions: many groups at a given date (so you can't use limit 1 or limit # because you don't know the # is), the least mysql request as possible Commented Feb 4, 2013 at 9:58

2 Answers 2

2

Not sure why you'd want to do this in php. Here's some SQL using joins instead to get most recent group(s) for all users given a date. Make sure you've got indexes on date and userid.

SELECT *
FROM test t1
LEFT JOIN test t2
ON t1.userid = t2.userid AND t2.thedate <= '2012-11-15' AND t2.thedate > t1.thedate
WHERE t1.thedate <= '2012-11-15' AND t2.userid IS NULL;

SQLfiddle

Or using your SQLFiddle

SELECT t1.*
FROM dossier_dans_groupe t1
LEFT JOIN dossier_dans_groupe t2
ON t1.dos_id = t2.dos_id AND t2.updated_at <= '2012-11-01' 
   AND t2.updated_at > t1.updated_at
WHERE t1.updated_at <= '2012-11-01' AND t2.dos_id IS NULL;
Sign up to request clarification or add additional context in comments.

2 Comments

Thx. I don't know how did you do it but I can't figure out this sql :O
though problem is resolved. Do you have answer for my first question (see above). I'll probably need it for later use: "what's the best way to find the index of an element of a date ordered array which has the date closest to a given date, beside using foreach loop to compare each loop"
1

This would give you a list of all users and their groups (1 row per group) for the latest date that is smaller than the one you specify (2012-11-15 below).

SELECT user_id, group_id, date FROM table WHERE date <= '2012-11-15' AND NOT EXISTS (SELECT 1 FROM table test WHERE test.user_id = table.user_id AND test.date > table.date and test.date <= '2012-11-15')

6 Comments

Edited to add a check against the specified date in the sub-query as well.
Not work. It extracts all the data which has the date smaller than the given date, it doesn't extracts the date of the closest date. I'll post a sqlfiddle in a few minutes
No, you have a typo in that fiddle. In the sub-query it should have said AND t2.updated_at > t1.updated_at but you had AND t2.updated_at > t2.updated_at.
you are right! I'm sorry. Your version is slightly faster of 0.01s, with 8k records. You answered first, get first
Regarding the first question, there are many search algorithms that will work with a slight modification. Since you're not looking for an exact value, you need to cut it down to two values; one bigger and one smaller. Either pick the one that is on the "right side" of your value (bigger or smaller), or calculate which one of the two have a smaller difference. A naive implementation is to recursively cut the array in half, use the half that contains your value (middle value being greater or not).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.