0

I'll most probably be using MemCache for caching some database results. As I haven't ever written and done caching I thought it would be a good idea to ask those of you who have already done it. The system I'm writing may have concurrency running scripts at some point of time. This is what I'm planning on doing:

  1. I'm writing a banner exchange system.
  2. The information about banners are stored in the database.
  3. There are different sites, with different traffic, loading a php script that would generate code for those banners. (so that the banners are displayed on the client's site)
  4. When a banner is being displayed for the first time - it get's cached with memcache.
  5. The banner has a cache life time for example 1 hour.
  6. Every hour the cache is renewed.

The potential problem I see in this task is at step 4 and 6. If we have for example 100 sites with big traffic it may happen that the script has a several instances running simultaneously. How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?

1 Answer 1

2

How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?

The approach to caching I take is, for lack of a better word, a "lazy" implementation. That is, you don't cache something until you retrieve it once, with the hope that someone will need it again. Here's the pseudo code of what that algorithm would look like:

// returns false if there is no value or the value is expired
result = cache_check(key)

if (!result)
{
    result = fetch_from_db()

    // set it for next time, until it expires anyway
    cache_set(key, result, expiry)
}

This works pretty well for what we want to use it for, as long as you use the cache intelligently and understand that not all information is the same. For example, in a hypothetical user comment system, you don't need an expiry time because you can simply invalidate the cache whenever a new user posts a comment on an article, so the next time comments are loaded, they're recached. Some information however (weather data comes to mind) should get a manual expiry time since you're not relying on user input to update your data.

For what its worth, memcache works well in a clustered environment and you should find that setting something like that up isn't hard to do, so this should scale pretty easily to whatever you need it to be.

Sign up to request clarification or add additional context in comments.

5 Comments

In my situation though the cache has to have expiration time because there aren't any operations the users could do rather than reading the "reading the comments" hypothetically. Thanks for the information! :)
Agreed, your situation is more like the weather data metaphor I used. But the algorithm I spelled out should still work for your case, and it'll be more fail-safe than chronologically just filling in that cache (although adding a chronological population might help in addition to that).
Just one more question - doing it with the code above will I have problems with simultaneously running scripts ? If for example 2 scripts are running at the same time how will they manage to agree who will set the cache first and who should read it ? Both of them would see that the cached result doesn't exist and would try to set it. Will that result in an error or problem ?
Shouldn't be an issue at all - they will both, as you said, read from the datasource, and one will overwrite the other. This isn't an issue as long as you're not doing dependent operations (like incrementing cache values, etc.). As long as you're just grabbing data and caching it, that data should be the same in that brief amount of time. If it's not, restrict the boundaries of your data so that it's a predictable interval you're caching with (so if you're running the script at 12 AM, only consider data created up until 11:59:59 PM).
Thanks again for the help Jimmy! Everything seems perfectly logical :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.