Clojure Performance, How to Type hint to r/map

Question

Below, I have 2 functions computing the sum of squares of their arguments. The first one is nice and functional, but 20x slower than the second one. I presume that the r/map is not taking advantage of aget to retrieve elements from the double-array, whereas I'm explicitly doing this in function 2.

Is there any way I can further typehint or help r/map r/fold to perform faster?

(defn sum-of-squares
  "Given a vector v, compute the sum of the squares of elements."
  ^double [^doubles v]
  (r/fold + (r/map #(* % %) v)))

(defn sum-of-squares2
  "This is much faster than above.  Post to stack-overflow to see."
  ^double [^doubles v]
  (loop [val 0.0
         i (dec (alength v))]
    (if (neg? i)
      val
      (let [x (aget v i)]
        (recur (+ val (* x x)) (dec i))))))

(def a (double-array (range 10)))
(quick-bench (sum-of-squares a))

800 ns

(quick-bench (sum-of-squares2 a))

40 ns

Jarlax · Accepted Answer · 2015-01-27 23:07:54Z

7

Before experiments I've added next line in project.clj:

:jvm-opts ^:replace [] ; Makes measurements more accurate

Basic measurements:

(def a (double-array (range 1000000))) ; 10 is too small for performance measurements
(quick-bench (sum-of-squares a)) ; ... Execution time mean : 27.617748 ms ...
(quick-bench (sum-of-squares2 a)) ; ... Execution time mean : 1.259175 ms ...

This is more or less consistent with time difference in the question. Let's try to not use Java arrays (which are not really idiomatic for Clojure):

(def b (mapv (partial * 1.0) (range 1000000))) ; Persistent vector
(quick-bench (sum-of-squares b)) ; ... Execution time mean : 14.808644 ms ...

Almost 2 times faster. Now let's remove type hints:

(defn sum-of-squares3
"Given a vector v, compute the sum of the squares of elements."
[v]
(r/fold + (r/map #(* % %) v)))

(quick-bench (sum-of-squares3 a)) ; Execution time mean : 30.392206 ms
(quick-bench (sum-of-squares3 b)) ; Execution time mean : 15.583379 ms

Execution time increased only marginally comparing to version with type hints. By the way, version with transducers has very similar performance and is much cleaner:

(defn sum-of-squares3 [v]
  (transduce (map #(* % %)) + v))

Now about additional type hinting. We can indeed optimize first sum-of-squares implementation:

(defn square ^double [^double x] (* x x))

(defn sum-of-squares4
  "Given a vector v, compute the sum of the squares of elements."
  [v]
  (r/fold + (r/map square v)))

(quick-bench (sum-of-squares4 b)) ; ... Execution time mean : 12.891831 ms ...

(defn pl
  (^double [] 0.0)
  (^double [^double x] (+ x))
  (^double [^double x ^double y] (+ x y)))

(defn sum-of-squares5
  "Given a vector v, compute the sum of the squares of elements."
  [v]
  (r/fold pl (r/map square v)))

(quick-bench (sum-of-squares5 b)) ; ... Execution time mean : 9.441748 ms ...

Note #1: type hints on arguments and return value of sum-of-squares4 and sum-of-squares5 have no additional performance benefits.

Note #2: It's generally bad practice to start with optimizations. Straight-forward version (apply + (map square v)) will have good enough performance for most situations. sum-of-squares2 is very far from idiomatic and uses literally no Clojure concepts. If this is really performance critical code - better to implement it in Java and use interop. Code will be much cleaner despite of having 2 languages. Or even implement it in unmanaged code (C, C++) and use JNI (not really maintainable but if properly implemented, can give the best possible performance).

answered Jan 27, 2015 at 23:07

Jarlax

1,58610 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Scott Klarenbach Over a year ago

Thanks. I'm aware that my v2 is not idiomatic, but the code is very performance sensitive (trillions of calculations) and I'd hate to have to reach for java everytime I have a patch of hot code. I would naturally prefer to use a generic clojure-esque version, but even a 10:1 performance slowdown is quite significant. So for this particular application I'll stick with my v2. I was just trying to have my cake and eat it too...approach the performance of v2 with the elegance of your transducers v3.

Scott Klarenbach Over a year ago

In other words, I'm treating v2 AS a piece of interop, without the overhead of 2 languages.

Rodrigo Taboada · Accepted Answer · 2015-01-29 06:05:01Z

1

Why not use areduce:

(def sum-of-squares3 ^double [^doubles v]
  (areduce v idx ret 0.0
           (let [item (aget v idx)]
             (+ ret (* item item)))))

On my machine running:

(criterium/bench (sum-of-squares3 (double-array (range 100000))))

Gives a mean execution time of 1.809103 ms, your sum-of-squares2 executes the same calculation in 1.455775 ms. I think this version using areduce is more idiomatic than your version.

For squeezing a little bit more performance you can try using unchecked math (add-unchecked and multiply-unchecked). But beware, you need to be sure that your calculation cannot overflow:

(defn sum-of-squares4 ^double [^doubles v]
  (areduce v idx ret 0.0
           (let [item (aget v idx)]
             (unchecked-add ret (unchecked-multiply item item)))))

Running the same benchmark gives a mean execution time of 1.144197 ms. Your sum-of-squares2 can also benefit from unchecked math with a 1.126001 ms mean execution time.

edited Jan 29, 2015 at 6:05

answered Jan 29, 2015 at 1:54

Rodrigo Taboada

2,7054 gold badges27 silver badges27 bronze badges

1 Comment

Scott Klarenbach Over a year ago

Thanks Rodrigo. I wasn't aware of areduce. That's exactly what I needed, a way to tell a reduction to use aget...

Collectives™ on Stack Overflow

Clojure Performance, How to Type hint to r/map

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related