I have a large CSV file that contains independent items that take a fair bit of effort to process. I'd like to be able to process each line item in parallel. I found a sample piece of code for processing a CSV file on SO here:
Newbie transforming CSV files in Clojure
The code is:
(use '(clojure.contrib duck-streams str-utils)) ;;'
(with-out-writer "coords.txt"
(doseq [line (read-lines "coords.csv")]
(let [[x y z p] (re-split #"," line)]
(println (str-join \space [p x y z])))))
This was able to print out data from my CSV file which was great - but it only used one CPU. I've tried various different things, ending up with:
(pmap println (read-lines "foo"))
This works okay in interactive mode but does nothing when running from the command line. From a conversation on IRC, this is because stdout isn't available by default to threads.
Really what I'm looking for is a way to idiomatically apply a function to each line of the CSV file and do so in parallel. I'd also like to print some results to stdout during testing if at all possible.
Any ideas?