Both Rulle's solution and leetwinski's second one use iterate in quite an anamorphic manner (especially the latter). Let's define unfold using iterate (usually the opposite is done) and write an explicitly anamorphic solution:
(defn unfold [f s]
(->> s
(list nil)
(iterate (comp f second))
rest
(take-while some?)
(map first)))
(defn merge-sorted [s]
(->> s
(filter seq)
(unfold
(fn [s]
(if (seq s)
(loop [[[mf & mn :as m] & s] s, r ()]
(if-let [[[xf :as x] & s] s]
(let [[m x] (if (< xf mf) [x m] [m x])]
(recur (cons m s) (cons x r)))
(list mf (if mn (cons mn r) r)))))))))
UPDATE
Here is a version of merge-sorted that uses reduce instead of loop and recur:
(defn merge-sorted [s]
(->> s
(filter seq)
(unfold
(fn [s]
(if (seq s)
(let [[[mf & mn] r]
(reduce
(fn [[m r] x]
(if (< (first x) (first m))
[x (cons m r)]
[m (cons x r)]))
[(first s) ()]
(rest s))]
(list mf (if mn (cons mn r) r))))))))
UPDATE'
Impressed by the efficiency of A. Webb's solution after the edit and considering this problem interesting and relatively important, I had a look at the Wikipedia articles on merge and k-way merge algorithms and at this paper. I found out that there is plenty of room for analysis/experimentation/improvement and decided to (re)implement and test several algorithms. Here they are, packed in a map, preceded by some helper functions and followed by some functions useful for testing:
(require ['clojure.core.reducers :as 'reducers])
(defn mapmap [f m]
(reduce #(update %1 %2 f) m (keys m)))
(defn unfold [f s]
(->> s
(list nil)
(iterate (comp f second))
rest
(take-while some?)
(map first)))
(defn foldh [f s]
((fn rec [v]
(f (if (> (count v) 2)
(let [h (quot (count v) 2)]
(map rec [(subvec v 0 h) (subvec v h)]))
v)))
(vec s)))
(defn fold2 [f s]
(loop [s s]
(if (nnext s)
(recur (map f (partition-all 2 s)))
(f s))))
(def merge-sorted
(merge
;direct lazy algorithms
(mapmap
(fn [[prepare choose insert]]
(fn [s]
(->> s
(filter seq)
prepare
(unfold
(fn [s]
(if (seq s)
(let [[[xf & xn] s] (choose s)]
[xf (if xn (insert s xn) s)])))))))
{:min
[identity
(fn [s]
(reduce
(fn [[x s] y]
(if (< (first x) (first y))
[x (cons y s)]
[y (cons x s)]))
[(first s) ()]
(rest s)))
conj]
:sort
[(partial sort-by first)
(juxt first rest)
(fn [s [xf :as x]]
(let [[a b] (loop [a () b (seq s)]
(if-let [[bf & bn] b]
(if (< (first bf) xf)
(recur (cons bf a) bn)
[a b])
[a b]))]
(into (cons x b) a)))]
:lsort
[(partial sort-by first)
(juxt first rest)
(fn [s [xf :as x]]
((fn rec [s]
(lazy-seq
(if-let [[sf] (seq s)]
(if (< (first sf) xf)
(cons sf (rec (rest s)))
(cons x s))
(list x))))
s))]
:heap
[(fn [s]
(let [h (java.util.PriorityQueue.
(count s)
#(< (first %1) (first %2)))]
(run! #(.add h %) s)
h))
(fn [h] [(.poll h) h])
(fn [h x] (.add h x) h)]})
;folding lazy algorithms
(mapmap
(letfn [(merge2 [s]
(lazy-seq
(if-let [[x & s] (seq (filter seq s))]
(if-let [[y] s]
((fn rec [x y]
(lazy-seq
(let [[[xf & xn] y]
(if (< (first x) (first y))
[x y]
[y x])]
(cons xf (if xn (rec xn y) y)))))
x y)
x))))]
(fn [fold] (partial fold merge2)))
{:foldl #(reduce (comp %1 list) %2)
:foldh foldh
:fold2 fold2})
;folding eager algorithms
(mapmap
(letfn [(merge2 [s]
(if-let [[x & s] (seq (filter seq s))]
(if-let [[y] s]
(loop [x x y y acc ()]
(let [[[xf & xn] y]
(if (< (first x) (first y))
[x y]
[y x])
acc (conj acc xf)]
(if xn
(recur xn y acc)
(into y acc))))
x)
()))]
(fn [fold] (partial fold merge2)))
{:efoldp #(reducers/fold 2 (comp %1 list) (comp %1 list) (vec %2))
:efoldh foldh
:efold2 fold2})))
(defn gen-inp [m n]
(->> 0
(repeat m)
(map
(comp
doall
(partial take n)
rest
(partial iterate #(+ % (rand-int 100)))))
doall))
(defn test-merge-sorted [m n & algs]
(->> (or algs (sort (keys merge-sorted)))
(map (juxt name merge-sorted))
(run!
(let [inp (gen-inp m n)]
(fn [[id alg]]
(println id)
;(java.lang.System/gc)
(try
(time (doall (alg inp)))
(catch java.lang.StackOverflowError _
(prn "Stack overflow"))))))))
The direct lazy algorithms follow a common scheme parameterized by how the following are done:
- preprocessing the input
- computing one chosen sequence and the rest ones
- inserting the tail of the chosen sequence into the rest ones
:min is like my first solution, which calculates a minimum at each iteration.
:sort is like Rulle's solution, which sorts the sequences initially and makes a sorted-insertion at each iteration.
:lsort is like :sort but with lazy insertion. It can cause stack overflow because of nested lazy sequences.
:heap is a simple but suboptimal implementation of heap-merge using Java's PriorityQueues.
The folding lazy algorithms follow a common scheme parameterized by how a <=2-ary merge is extended to arbitrary arities.
:foldl is like A. Webb's solution before the edit, which does a left folding using reduce. It can cause stack overflow because of nested lazy sequences.
:foldh is an implementation of divide-and-conquer-merge, which folds by splitting in half.
:fold2 is like A. Webb's solution after the edit, which folds by splitting in pairs.
The folding eager algorithms follow a scheme like that of the lazy ones but using an eager <=2-ary merge.
:efoldp is a parallel implementation of divide-and-conquer-merge using clojure.core.reducers/fold, which does various <=2-ary merges concurrently and possibly in parallel, by "forking" every time it splits in half.
:efoldh and :efold2 are like :foldh and :fold2 but eager.
As a short remark on performance I would say that, for fast lazy merging, one of :foldh, :fold2 or maybe :heap should be chosen. As for eager merging, depending on the hardware's parallelism ability and on the input's shape, :efoldp can be faster than the rest. For more information see the linked articles.
loop/recur(or plain Java for max speed).