2

I'm building a script that traverses twitterusers, analyses the language of their tweets and if the right language is found, all friends and followers are added to a queue. These users are in turn picked from the queue, and the process is carried out again and again. To keep the db fast, I'm using the same table for all different states a user can have in the queue ("to be analyzed for language" = 1, "to be fetched" = 2, "in progress" =9, "done" = 99 and "blocked" = -1). That way I can just add all friends/followers to the table without having to check if the person already exist in the table (each twitter user should of course only be analyzed once).

INSERT IGNORE INTO queue (tid,queuetype) VALUES (1,1),(2,1) ... (xxx,1);

This is quite fast. But as the table is growing (a couple of million rows) selecting the next user from the queue it becomes slower and slower.

Right now, I do it this way ($uniqueid is actually the process number):

UPDATE queue SET k='$uniqueid', queuetype = '9' WHERE k='0' AND queuetype = '1' LIMIT 1

followed by:

SELECT tid FROM queue WHERE k='$uniqueid' LIMIT 1

I then do all the magic, and finally change the queuetype to a new queuetype (done, blocked, etc).

Can the solution be further optimized? The "SELECT tid" is very slow and takes multiple seconds to run. If I add a index to k, selecting becomes faster but updating turns very sloooow, and the result is worse.

How to further optimize this type of queues? Should I consider a different design? A different database? All solutions are welcome :)

[EDIT]

Engine is Myisam

EXPLAIN queue

 tid    int(11) NO  PRI     
 queuetype  tinyint(1)  NO          
 k  mediumint(6) unsigned   NO          
1
  • Perhaps you could expand a little on your index? What type is it? What storage engine are you using? What do you get if you try EXPLAIN PLAN? That sort of thing. Commented Apr 4, 2012 at 9:28

2 Answers 2

1

You may be experiencing slowness due to table level locking in MyISAM engine.

Table-Level Locking

MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, permitting only one session to update those tables at a time. This locking level makes these storage engines more suitable for read-only, read-mostly, or single-user applications.

May refer to this answer.

According to the Internal Locking Methods
You may use InnoDB engine instead as your use case involves some concurrent access to the table. InnoDB uses row-level locking. You also have to keep the indexing on k in order to serve the SELECT when your table is large.

Sign up to request clarification or add additional context in comments.

Comments

0

I would suggest that if you want fast INSERT performance and only want to search on exact matches, then you need a hashed index. But perusing the documentation here, I learn that hashed indexes are only available for the NDB storage engine.

I don't know anything about that storage engine so would hesitate to recommend it, but it might be worth a try if it isn't too inconvenient.

See also here.

1 Comment

interesting, will look it up! But actually, it's the SELECT that is slow, not the INSERT. The INSERTs are quite fast...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.