2

This is a clojure program to read integers from a file and count the number of inversions i.e the number of times a larger number appears before a smaller number in a sequence. It implements a O(n^2) algorithm, and takes about 30 minutes to run on an input of size 100,000.

(defn count-inversions
  [file]
  (let [int-list (vec (map #(Integer/parseInt %)
                           (clojure.string/split (slurp file) #"\r\n")))]
    (loop [c 0
           t-list int-list]
      (let [size (count t-list)]
        (if (empty? t-list)
         c
         (recur (reduce #(if (< %2 (first t-list)) (inc %1) %1) c (subvec t-list 1 (dec  size)))
                (subvec t-list 1 (dec size))))))))

This algorithm when implemented in Java takes only a few seconds to complete. Why such a large difference?

1
  • Try calling time on various parts to see where the time is being spent. Using a long-array and indexes might speed things up, but I can't see it being 4 orders of magnitude different. Commented Mar 25, 2012 at 7:13

1 Answer 1

5

Things that look potentially slow to me (though you'll have to benchmark to be sure....)

  • All your numbers are boxed. This will make anything you do much slower than using primitives. It isn't very idiomatic but you can use primitive arrays in Clojure if you want to get this speedup.
  • Reducing over a subvec currently creates a lot of temporary objects. There is a patch in the works to fix this (http://dev.clojure.org/jira/browse/CLJ-894) but you might have to wait for Clojure 1.4 or 1.5 for this.
  • It helps to do (set! *unchecked-math* true) to improve performance of integer operations (obviously, you need to be a bit more careful if overflows might happen)

If I wanted to get this to run really fast in Clojure I'd probably resort to primitive arrays and looping with primitive indexes:

(set! *unchecked-math* true)

(defn count-inversions [coll]
     (let [^ints integers (int-array coll)
           size (long (count integers))]
       (loop [c (long 0)
              i (long 0)]
         (if (>= i size)
           c
           (recur
             (loop [c (long c)
                    v (aget integers i)
                    j (inc i)]
               (if (>= j size)
                 c
                 (recur (long (if (> v (aget integers j)) (inc c) c)) v (inc j))))
             (inc i))))))

(time (count-inversions (for [i (range 100000)] (rand-int 100))))
"Elapsed time: 16036.651863 msecs"

i.e. this code counts all the inversions in 100,000 integers in about 16 seconds on my machine (with Clojure 1.3)

Sign up to request clarification or add additional context in comments.

2 Comments

Would you post a link to the definition of a boxed number? I'm curious about the term. Thank you.
It basically means that a number is being packaged up as an Object: lots of links on Google, here is one: docs.oracle.com/javase/1.5.0/docs/guide/language/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.