3

I'm working on a method that, due to the expense of it's operation and frequency which it is called with identical arguments, would benefit from caching the return values.

I'll be serialize()-ing the arguments together for cache keys, but this can result in very long keys, due to the lengthy array arguments.

  • Does PHP array indexing and look-up suffer from such long keys (think from 250B to 1kB+)?
  • So far so good, but am I facing a situation where this could fail spectacularly on me at some point?
  • Basically, should I md5() (or alternative) the keys?

Minor clarifications:
This is only per-request caching, with no permanent storage. The method in question is that of a view helper, and for each view generation it may be called 500 times or more.

3
  • Somehow it´s weird to use MD5 instead of serialize. Commented Jul 9, 2011 at 19:34
  • @Hans Wassink - md5() the result of serialize() is what I was getting at. Commented Jul 9, 2011 at 19:36
  • Well anyways, all hail to fyr :D Commented Jul 9, 2011 at 19:42

2 Answers 2

4

You should definitely hash the key. You would maybe say "Why should i risk getting collisons when i can concatenate at every point of time a unique key?". The easy answer is if you generate cache keys via string concatenation you have to calculate always with the worst case of space requirements to estimate ram usage.

So if you have a cache with 200 entries .. and 2 fields with 20character max strings. The worst case would be 200*2*20*(size of character). If you load the complete cache on every possible parallel connection this will multiply by the amount of parallel connections.

With hashes you always have the minimum ram requirement = maximum ram requirement for the key field.

If you have many values which are concatenated for the keys this will scale very bad.

Edit:

Even if you use it per request the array consumes memory. Though it is a cache it is present from the beginning to the end of the request. So you need to take into account that it consumes a certain amount of space in your memory and while using a hash its a fixed amount.

The second thing is that keys need to be compared. If you access an associative array with a string key your interpreter needs to compare the key charwise. If you have a hash method for generating keys this will be also a fixed number of steps.

If you use concatenation the amount of steps will range between the best and the worst case.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @fyr - As per my clarification, there is no permanent storage for this cache, as it is per-request. Regardless, my keys have the tendency to become enormous, ranging from 30B on the low end to as high as 1kB+; I'm doing a serialize(func_get_args()) with a 6 parameter function, 3 of which are often deep arrays. I'll benchmark it, but for both memory preservation and readability on debug dumps, I think I'll go for hashing.
1

It's certainly not typical to have such long array keys, but you'd probably have to do benchmarking to guess how much slowdown it actually gives. If in doubt though, just md5 it - from what you're saying the speedup will still be so much compared to the md5 time that the latter will be trivial.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.