Before experiments I've added next line in project.clj:
:jvm-opts ^:replace [] ; Makes measurements more accurate
Basic measurements:
(def a (double-array (range 1000000))) ; 10 is too small for performance measurements
(quick-bench (sum-of-squares a)) ; ... Execution time mean : 27.617748 ms ...
(quick-bench (sum-of-squares2 a)) ; ... Execution time mean : 1.259175 ms ...
This is more or less consistent with time difference in the question. Let's try to not use Java arrays (which are not really idiomatic for Clojure):
(def b (mapv (partial * 1.0) (range 1000000))) ; Persistent vector
(quick-bench (sum-of-squares b)) ; ... Execution time mean : 14.808644 ms ...
Almost 2 times faster. Now let's remove type hints:
(defn sum-of-squares3
"Given a vector v, compute the sum of the squares of elements."
[v]
(r/fold + (r/map #(* % %) v)))
(quick-bench (sum-of-squares3 a)) ; Execution time mean : 30.392206 ms
(quick-bench (sum-of-squares3 b)) ; Execution time mean : 15.583379 ms
Execution time increased only marginally comparing to version with type hints. By the way, version with transducers has very similar performance and is much cleaner:
(defn sum-of-squares3 [v]
(transduce (map #(* % %)) + v))
Now about additional type hinting. We can indeed optimize first sum-of-squares implementation:
(defn square ^double [^double x] (* x x))
(defn sum-of-squares4
"Given a vector v, compute the sum of the squares of elements."
[v]
(r/fold + (r/map square v)))
(quick-bench (sum-of-squares4 b)) ; ... Execution time mean : 12.891831 ms ...
(defn pl
(^double [] 0.0)
(^double [^double x] (+ x))
(^double [^double x ^double y] (+ x y)))
(defn sum-of-squares5
"Given a vector v, compute the sum of the squares of elements."
[v]
(r/fold pl (r/map square v)))
(quick-bench (sum-of-squares5 b)) ; ... Execution time mean : 9.441748 ms ...
Note #1: type hints on arguments and return value of sum-of-squares4 and sum-of-squares5 have no additional performance benefits.
Note #2: It's generally bad practice to start with optimizations. Straight-forward version (apply + (map square v)) will have good enough performance for most situations. sum-of-squares2 is very far from idiomatic and uses literally no Clojure concepts. If this is really performance critical code - better to implement it in Java and use interop. Code will be much cleaner despite of having 2 languages. Or even implement it in unmanaged code (C, C++) and use JNI (not really maintainable but if properly implemented, can give the best possible performance).