5

I have developed a LAMP application based on the CodeIgniter framework and MySQL/InnoDB. It basically is a photo community. The front-end of the application has a file uploader that allows for the selection and uploading of multiple photos at once. Next, these photos are handled by a back-end processing script, a model in CI terms. Thus if 6 photos are being uploaded, the back-end script is called 6 times, in parallel. I need to keep the file uploader (and processing) multi-threaded, as it brings great speed advantages.

So the back-end script is called multiple times in parallel. This script takes care of many tasks, most of which work fine in a multi-threaded scenario, except for one part of the script.

Uploaders are awarded "medals" for every 10 photos uploaded. This check is done at the end of the back-end script. Here is a scenario where this goes wrong: say the uploader had uploaded 9 photos before, and is now uploading 4 more. The first instance of the back-end script should conclude that the newly uploaded photo pushes the count to 10, so a medal should be awarded. Next, the desired outcome should be for the other 3 back-end tasks to not award a medal, as it has already been given for crossing the 10 count. The actual outcome, however, is that all 4 instances of the back-end task, as they are running in parallel, make the same conclusion, thus 4 medals are handed out.

Clearly, this part of my back-end script is not thread-safe, and behaves incorrectly when ran in parallel. My thinking was that if I would simply wrap this part of the back-end script in a transaction, it would become thread-safe. In pseudo code:

1. bulk of back-end processing task. Thread-safe by design, so not in a transaction
2. $this->db->trans_begin();
3. medal handling code here
4. $this->db->trans_commit();

My theory was that by wrapping this non-thread-safe part in a transaction, it would become thread-safe. It would lock up the tables touched and guarantee fresh reads and writes. The lockups would be very brief.

It seems my theory is wrong though. The issue still persists after this change. It is quite hard to reproduce, yet I have seen 2 occasions where it failed again.

I am wondering if there is a conceptual flaw in my thinking? How do i make this little part of my back-end task safe to be ran in parallel?

Extra information to discuss Zak's answer:

Detailed steps of my back-end script:

  1. basic validation (check if file was received and such) - no db activity
  2. save image record in database (this increases the # of images for the user, which matters for the medal part
  3. process uploaded file into thumbs (no db activity, yet takes a long time, say 20 secs)
  4. determine the current amount of images uploaded (ever) for the current user
  5. if more than 10, or an exact multitude of 10, reward with a medal

To try out Zak's answer, I have rewritten the image count query into this:

SELECT COUNT(id) as count FROM image WHERE user_id = ? AND status='active' FOR UPDATE

This is step 4 of the back-end script. To make debugging easier, I am writing to a log file the amount of images in this step. Next, I repeatedly upload sets of images and check the log for the counters. Unfortunately, I still get situations where multiple parallel processes are reporting the same image count, which leads to multiple medals.

I have played around with variations, I have moved step 4 to step 2. i have tried wrapping steps 2 and 4 in a transaction (after moving them above step 3). All to no avail, I still cannot reliably get a correct image count in all cases.

2 Answers 2

1

I don't realy know if I'm giving you an ideea or I'm talking gibberish, in terms of concept, not code, maybe you can just give the medals a few seconds later, after all transactions are processed. I've seen many scripts and games using this concept. The reward is not given in real time. After the user finished uploading photos - it doesn't matter how the upload is handled - then he gets a message that he has a new medal. He can be notified when he browses to another page or so.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, last night I had a similar idea. If I fail to solve it in the current setup, this will be my next option.
1

Transactions don't necessarily give you the kind of locking you want Check out:

http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_read-committed

You want a read lock . Basically, on your select, you say FOR UPDATE , and this will prevent parallel reads from reading that data until the transaction is committed.

Note: The FOR UPDATE just tacks onto the end of your query, and this only works if your table is InnoDb

1 Comment

Thanks so much @Zak. I have a feeling that your answer is correct, yet I still cannot get it applied correctly. I have updated the question to provide more detail and findings. I'm hoping you can have another look at is, as I'm truly stuck :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.