I wrote an edit distance algorithm both in clojure and scala.
The scala version runs 70x faster than the clojure one.
clojure:
(defn edit-distance
"['seq of char' 'seq of char']"
[s0 s1]
(let [n0 (count s0)
n1 (count s1)
distances (make-array Long/TYPE (inc n0) (inc n1))]
;;initialize distances
(doseq [i (range 1 (inc n0))] (aset-long distances i 0 i))
(doseq [j (range 1 (inc n1))] (aset-long distances 0 j j))
(doseq [i (range 1 (inc n0)), j (range 1 (inc n1))]
(let [ins (aget distances i (dec j))
del (aget distances (dec i) j)
match (aget distances (dec i) (dec j))
min-dist (min ins del match)]
(cond
(not= match min-dist) (aset-long distances i j (inc min-dist))
(not= (nth s0 (dec i)) (nth s1 (dec j))) (aset-long distances i j (inc min-dist))
:else (aset-long distances i j min-dist))))
(aget distances n0 n1)))
scala:
def editDistance(s0: Array[Char], s1: Array[Char]):Int = {
val n0 = s0.length
val n1 = s1.length
val distances = Array.fill(n0+1)(ArrayBuffer.fill(n1+1)(0))
for(j <- 0 to n1){distances(0)(j) = j}
for(i <- 0 to n0){distances(i)(0) = i}
for(i <- 1 to n0; j <- 1 to n1){
val ins = distances(i)(j-1)
val del = distances(i-1)(j)
val matches = distances(i-1)(j-1)
val minDist = (ins::del::matches::Nil).reduceLeft(_ min _)
if (matches != minDist)
distances(i)(j) = minDist + 1
else if (s0(i-1) == s1(j-1))
distances(i)(j) = minDist
else
distances(i)(j) = minDist + 1
}
distances(n0)(n1)
}
I am using java's array in clojure to get the best performance. I have considered hinting whenever agetis called but my code performs even worse (which might be expected as make-array already defines a typed array). I have also overridden clojure :jvm-opts in projects.clj. Yet the lower performance gap I get is 70x.
What's wrong with my use of java array in clojure?
Thanks for insight.