28

Implementing a system where, when it comes to the heavy mathematical lifting, I want to do as little as possible.

I'm aware that there are issues with memoisation with numpy objects, and as such implemented a lazy-key cache to avoid the whole "Premature optimisation" argument.

def magic(numpyarg,intarg):
    key = str(numpyarg)+str(intarg)

    try:
        ret = self._cache[key]
        return ret
    except:
        pass

    ... here be dragons ...
    self._cache[key]=value
    return value

but since string conversion takes quite a while...

t=timeit.Timer("str(a)","import numpy;a=numpy.random.rand(10,10)")
t.timeit(number=100000)/100000 = 0.00132s/call

What do people suggest as being 'the better way' to do it?

2

3 Answers 3

30

Borrowed from this answer... so really I guess this is a duplicate:

>>> import hashlib
>>> import numpy
>>> a = numpy.random.rand(10, 100)
>>> b = a.view(numpy.uint8)
>>> hashlib.sha1(b).hexdigest()
'15c61fba5c969e5ed12cee619551881be908f11b'
>>> t=timeit.Timer("hashlib.sha1(a.view(numpy.uint8)).hexdigest()", 
                   "import hashlib;import numpy;a=numpy.random.rand(10,10)") 
>>> t.timeit(number=10000)/10000
2.5790500640869139e-05
Sign up to request clarification or add additional context in comments.

3 Comments

Nice! For multidimensional arrays this gives a different hash (for the "same" array) depending on whether it's fortran or c contiguous. If that's an issue, calling np.ascontiguousarray should solve it.
Not sure why a known slow hash function sha1 is chosen. SHA-1 is OK for minimising hash collision but poor at speed. For speed you'll need something like murmurhash or xxhash (the latter claims to be even faster).
@CongMa, thanks for the extra info. There are lots of options! But as you'll notice, this is already two orders of magnitude faster. And speed is never the only concern. It's probably worth using a well-understood hash if the alternative is only a few millionths of a second faster.
7

There is a package for this called joblib. Found from this question.

from joblib import Memory
location = './cachedir'
memory = Memory(location)

# Create caching version of magic
magic_cached = memory.cache(magic)
result = magic_cached(...)

# Or (for one-time use)
result = memory.eval(magic, ...)

1 Comment

It would be better to have a quote from those links copied over in your answer, in case these websites go offline.
2

For small numpy arrays also this might be suitable:

tuple(map(float, a))

if a is the numpy array.

1 Comment

Oh yes, tuple is hashable in comparison with list!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.