Memoize a Clojure function that takes a lazy sequence as input

Question

How can I have memoize work when the argument to a memoised function is a sequence

(defn foo
    ([x] (println "Hello First") (reduce + x))
    ([x y] (println "Hello Second") (reduce + (range x y))))


(def baz (memoize foo))

Passing one arg:

1)

(time (baz (range 1 1000000))) ;=> Hello First "Elapsed time: 14.870628 msecs"

2)

(time (baz (range 1 1000000))) ;=> "Elapsed time: 65.386561 msecs"

Passing 2 args:

1)

(time (baz 1 1000000)) ;=> Hello Second "Elapsed time: 18.619768 msecs"

2)

(time (baz 1 1000000)) ;=> "Elapsed time: 0.069684 msecs"

The second run of the function when passed 2 arguments seems to be what I expect.

However using a vector appears to work...

(time (baz [1 2 3 5 3 5 7 4 6 7 4 45 6 7])) ;=> Hello First "Elapsed time: 0.294963 msecs"


(time (baz [1 2 3 5 3 5 7 4 6 7 4 45 6 7])) ;=> "Elapsed time: 0.068229 msecs"

Hmm. How to handle that is a tricky thing -- one needs to consume the whole sequence to determine identity for comparison purposes, but one of the advantages of being able to support ISeq is potential for laziness. So I can see the argument for the current semantics -- that we don't want to increase the up-front cost of a call (to fully realize a sequence that could, in theory, be infinite) in order to potentially enable caching. — Charles Duffy
– Charles Duffy, Commented Jul 31, 2016 at 15:13
...at an implementation layer, the question is what happens when a sequence is used as a key in a map. ie. do sequences support .hashCode in a way that reflects their contents (and is thus O(n) requiring evaluation), in a way that reflects their identity (such that the same sequence will only compare with itself), or not at all? — Charles Duffy
– Charles Duffy, Commented Jul 31, 2016 at 15:15
@CharlesDuffy - any implementation of sequence comparison (hash codes etc.) can only optimise the case of inequality. So paradoxically right when memoize is designed to be useful - when you have the equal key in the map - you will end up with the O(n) comparison time, no matter what the implementation is. The moral of the story - memoize is not designed to be used with long sequences as parameters. — Yuri Steinschreiber
– Yuri Steinschreiber, Commented Jul 31, 2016 at 19:38

Yuri Steinschreiber · Accepted Answer · 2016-07-31 18:20:35Z

memoize does work with sequences, you just need to compare apples to apples. memoize looks up the parameter in the hash map of previously used ones, and as a result you end up comparing the sequences. Comparing long sequences is what takes a long time, whether they are vectors or not:

user> (def x (vec (range 1000000)))
;; => #'user/x
user> (def y (vec (range 1000000)))
;; => #'user/y
user> (time (= x y))
"Elapsed time: 64.351274 msecs"
;; => true
user> (time (baz x))
"Elapsed time: 67.42694 msecs"
;; => 499999500000
user> (time (baz x))
"Elapsed time: 73.231174 msecs"
;; => 499999500000

When you use very short input sequences, the timing is dominated by the reduce inside your function. But with very long ones most of the time you see is actually the comparison time inside memoize.

So technically memoize works, in the same way for all sequences. But working "technically" doesn't imply "being useful." As you have discovered yourself, it is useless (actually maybe even harmful) for input with expensive comparison semantics. Your second signature solves this problem.

Collectives™ on Stack Overflow

Memoize a Clojure function that takes a lazy sequence as input

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related