16

How can I cast a sequence back to vector after a sequence producing operation (like sort)? Does using (vec..) on a sequence that was a vector is costly?

One (bad?) possibility is creating a new vector out of sequence:

(vec (sort [1 2 3 4 5 6]))

I am asking because I need random access (nth ..) to huge sorted vectors - which are now huge sequences after the sort, with horrible O(n) random access time

4 Answers 4

7

Meikel Brandmeyer just posted a solution to this on the Clojure group.

(defn sorted-vec
  [coll]
  (let [arr (into-array coll)]
    (java.util.Arrays/sort arr)
    (vec arr)))

Clojure's sort returns a seq across a sorted array; this approach does much the same thing, but returns a vector, not a seq.

If you wish, you can even skip the conversion back into a Clojure persistent data structure:

(defn sorted-arr
  "Returns a *mutable* array!"
  [coll]
  (doto (into-array coll)]
    (java.util.Arrays/sort))

but the resulting Java array (which you can treat as a Clojure collection in most cases) will be mutable. That's fine if you're not handing it off to other code, but be careful.

Sign up to request clarification or add additional context in comments.

3 Comments

Should be java.util.Arrays/sort (he forgot the s). But if it is the same thing, how come it it 4 times faster ? I posted in the Clojure group the timings.
It's not 4 times faster, at least on a server VM (which you should be using). c.c.sort uses an explicit Comparator, and returns a seq. It also calls to-array, not into-array: the former returns an array of Objects, whilst the latter returns a typed array. Apart from those things, which make sort more general, it's the same code. Those generalities cost. The whole purpose of this exercise is to avoid some of that work; that's why this version is faster.
5

From my own tests (nothing scientific) you may be better with working directly on arrays in cases where you do lots of sorting. But if you sort rarely and have a lots of random access to do though, going with a vector may be a better choice as random access time is more than 40% faster on average, but the sorting performance is horrible due to converting the vector to an array and then back to a vector. Here's my findings:

(def foo (int-array (range 1000)))

(time
  (dotimes [_ 10000]
    (java.util.Arrays/sort foo)))

; Elapsed time: 652.185436 msecs

(time
  (dotimes [_ 10000]
    (nth foo (rand-int 1000))))

; Elapsed time: 7.900073 msecs

(def bar (vec (range 1000)))

(time
  (dotimes [_ 10000]
    (vec (sort bar))))

; Elapsed time: 2810.877103 msecs

(time
  (dotimes [_ 10000]
    (nth bar (rand-int 1000))))

; Elapsed time: 5.500802 msecs

P.S.: Note that the vector version doesn't actually store the sorted vector anywhere, but that shouldn't change the result considerably as you would use simple bindings in a loop for speed.

1 Comment

Nice. Now it is easy to see that doing (vec) on sorted vector is 4 times slower than sorting direct arrays! The random access time is so fast in both vector and array that the 40% doesn't matter I think.
5

If you need to random access on the result of sort with huge vectors, then the time took by the call to vec should be far outweighed by time savings of doing so.

If you profile and find that it is too slow, you'll probably have to use java arrays.

1 Comment

That's what I am doing now. calling vec. But I wonder if there is some better way
-1

As a new Clojure developer, it is easy to confuse collections and sequences.

This sorted vector function:

(sort [1 2 3 4 5 6]) => (1 2 3 4 5 6) ; returns a sequence

But I need a vector for the next operation because this does not work...

(take-while (partial > 3) (1 2 3 4 5 6))

=>ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval2251 (NO_SOURCE_FILE:2136)

Let us try to convert the sequence to a vector:

(vec (1 2 3 4 5 6))

=>ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval2253 (NO_SOURCE_FILE:2139)

Nope! But if you put it all together, it works just fine.

(take-while (partial > 3) (sort [1 2 3 4 5 6]))

=>(1 2)

The lesson: You cannot work with sequences directly! They are an intermediate step in the process. When the REPL tries to evaluate (1 2 3 4 5 6), it sees a a function and throws an exception:

(1 2 3 4 5 6) =>ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval2263 (NO_SOURCE_FILE:2146)

1 Comment

This is misleading. Evaluating (sort [1 2 3 4 5 6]) in the REPL followed by (take-while (partial > 3) *1) works just fine. If you just take the string representation of a sequence, you lose the type information.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.