I'm building a script that traverses twitterusers, analyses the language of their tweets and if the right language is found, all friends and followers are added to a queue. These users are in turn picked from the queue, and the process is carried out again and again. To keep the db fast, I'm using the same table for all different states a user can have in the queue ("to be analyzed for language" = 1, "to be fetched" = 2, "in progress" =9, "done" = 99 and "blocked" = -1). That way I can just add all friends/followers to the table without having to check if the person already exist in the table (each twitter user should of course only be analyzed once).
INSERT IGNORE INTO queue (tid,queuetype) VALUES (1,1),(2,1) ... (xxx,1);
This is quite fast. But as the table is growing (a couple of million rows) selecting the next user from the queue it becomes slower and slower.
Right now, I do it this way ($uniqueid is actually the process number):
UPDATE queue SET k='$uniqueid', queuetype = '9' WHERE k='0' AND queuetype = '1' LIMIT 1
followed by:
SELECT tid FROM queue WHERE k='$uniqueid' LIMIT 1
I then do all the magic, and finally change the queuetype to a new queuetype (done, blocked, etc).
Can the solution be further optimized? The "SELECT tid" is very slow and takes multiple seconds to run. If I add a index to k, selecting becomes faster but updating turns very sloooow, and the result is worse.
How to further optimize this type of queues? Should I consider a different design? A different database? All solutions are welcome :)
[EDIT]
Engine is Myisam
EXPLAIN queue
tid int(11) NO PRI
queuetype tinyint(1) NO
k mediumint(6) unsigned NO