2

What are the best simple ways to speed up this function? The equivalent code in java is nearly 50 times faster according to Criterium.

I bet if I use a java Array and reduce the amount of boxing that would all help, but I thought I'd post first here to see if there was any basic mistakes I was making that could easily be fixed. Note I've already indicated (double...) for Clojure, which greatly improved the performance, but still nothing like Java. I've also first converted the seq using (double-array...) instead of using (vec ...) inside the function, and that also improved performance, but again, nothing like Java.

(defn cosine-similarity [ma mb]
  (let [va (vec ma), vb (vec mb)]
    (loop [p (double 0)
           na (double 0)
           nb (double 0)
           i (dec (count va))]
      (if (neg? i)
        (/ p (* (Math/sqrt na) (Math/sqrt nb)))
        (let [a (double (va i))
              b (double (vb i))]
          (recur (+ p (* a b))
                 (+ na (* a a))
                 (+ nb (* b b))
                 (dec i)))))))

Note that ma and mb are both seqs, containing 200 Doubles each. In the java version they are passed as double[] args.

0

2 Answers 2

5

Using (double 0) has no performance benefit you wouldn't get from specifying 0.0 (a double literal) directly.

You will get significantly better performance if you pass in ma and mb as double-array and hint the args as doubles, don't convert them to vector via vec, and use aget to do the element lookup. This should leave you with something very close to the java code's performance.

The double calls inside the let block won't be needed if you use double arrays as the function args.

the end result should look something like this:

(defn cosine-similarity [^doubles ma ^doubles mb]
  (loop [p 0.0
         na 0.0
         nb 0.0
         i (dec (count va))]
    (if (neg? i)
      (/ p (* (Math/sqrt na) (Math/sqrt nb)))
      (let [a (aget va i)
            b (aget vb i)]
        (recur (+ p (* a b))
               (+ na (* a a))
               (+ nb (* b b))
               (dec i))))))
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. Clojure version is now only twice as slow as java, and 25 times faster than before!
But note passing in the args as double-array didn't help, it was providing the typehint ^doubles and then using aget as you suggest.
I should edit that to be more clear: it requires both passing double-array, and hinting the arguments.
2

Did you try adding ?

(set! *unchecked-math* true)

Since you know the range, you could probably use it to gain additional speed.

Edit: @noisesmith is right, double-array and typing the input makes a tremendous difference.

Edit2: Getting blazing fast results after Alex Miller's comments.

(set! *unchecked-math* true)

(defn ^double cosine-similarity
  [^doubles va ^doubles vb] 
    (loop [p 0.0
    na 0.0
    nb 0.0
    i  (dec (alength va))]
  (if (< i 0)
    (/ p (* (Math/sqrt na) (Math/sqrt nb)))
    (let [a  (aget va i)
          b  (aget vb i)]
       (recur (+ p (* a b))
         (+ na (* a a))
         (+ nb (* b b))
         (dec i))))))

 (defn rand-double-arr [n m]
   (double-array
     (take n (repeatedly #(rand m)))))

 (def ma (rand-double-arr 200 10000))
 (def mb (rand-double-arr 200 10000))

 ; using do times
 (dotimes [_ 30] (time (cosine-similarity ma mb)))
 ; ...
 ; "Elapsed time: 0.003537 msecs"

 ; using criterium: [criterium "0.4.3"]
 (use 'criterium.core)
 (quick-bench (cosine-similarity ma mb))
 ; 
 ; Execution time mean           : 2.072280 µs
 ; Execution time std-deviation  : 214.653997 ns
 ; Execution time lower quantile : 1.765412 µs ( 2.5%)
 ; Execution time upper quantile : 2.284536 µs (97.5%)
                   Overhead used : 6.128119 ns

First version was in the 500~1000msecs range ...

6 Comments

ya I tried that, but it didn't seem to make any difference.
btw, what are the performances you're getting ? in Clojure vs Java ?
without type hints Clojure fn took 25 microseconds while java took 500 nanoseconds. With typehints Clojure took 1 microsecond.
Some other suggestions:
Some minor suggestions: 1) use alength instead of count (avoids unboxing result) 2) type hint the return of cosine-similarity to a double (avoids boxing on the return) 3) time with something like (dotimes [_ 30] (time (cosine-similarity ma mb))) to see the jit warm up (or use criterium).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.