Distributed computing framework for Clojure/Java

Question

I'm developing an application where I need to distribute a set of tasks across a potentially quite large cluster of different machines.

Ideally I'd like a very simple, idiomatic way to do this in Clojure, e.g. something like:

; create a clustered set of machines
(def my-cluster (new-cluster list-of-ip-addresses))

; define a task to be executed
(deftask my-task (my-function arg1 arg2))

; run a task 10000 times on the cluster
(def my-job (run-task my-cluster my-task {:repeat 10000})

; do something with the results:
(some-function (get-results my-job))

Bonus if it can do something like Map-Reduce on the cluster as well.....

What's the best way to achieve something like this? Maybe I could wrap an appropriate Java library?

UPDATE:

Thanks for all the suggestion of Apache Hadoop - looks like it might fit the bill, however it seem a bit like overkill since I'm not needing a distributed data storage system like Hadoop uses (i.e. i don't need to process billions of records)... something more lightweight and focused on compute tasks only would be preferable if it exists.

in the clojure google group, there have been discussions about terracotta, GridGain and the java (JMS, JXTA/shoal, JINI ) standards, you can google for them. — Gene T
– Gene T, Commented Mar 26, 2011 at 3:14

David J. · Accepted Answer · 2012-07-18 01:56:03Z

8

Hadoop is the base for almost all the large scale big data excitement in the Clojure world these days though there are better ways than using Hadoop directly.

Cascalog is a very popular front end:

    Cascalog is a tool for processing data on Hadoop with Clojure in a concise and
    expressive manner. Cascalog combines two cutting edge technologies in Clojure 
    and Hadoop and resurrects an old one in Datalog. Cascalog is high performance, 
    flexible, and robust.

Also check out Amit Rathor's swarmiji distributed worker framework build on top of RabbitMQ. it's less focused on data processing and more on distributing a fixed number of tasks to a pool of available computing power. (P.S. It's in his book, Clojure in Action)

edited Jul 18, 2012 at 1:56

David J.

32.9k24 gold badges131 silver badges177 bronze badges

answered Feb 26, 2011 at 21:38

Arthur Ulfeldt

91.7k30 gold badges205 silver badges285 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mikera Over a year ago

thanks, look very interesting. although I'm more interested in distributed processing rather than big data - hadoop seems more focussed on the latter?

user61051 Over a year ago

In my experience Hadoop is quite awkward; it's a lot of work to set up a cluster, and it's tricky to debug. If you don't need HDFS, it's much simpler just to use a queueing system like RabbitMQ to ship around s-expressions. There are a few libraries existing to do this, but wrapping the Java RabbitMQ client in a couple pages of Clojure code is easy enough to be in "left as an exercise for the reader" territory.

Flux · Accepted Answer · 2019-05-28 14:36:26Z

7

Although I haven't gotten to use it yet, I think that Storm is something that you might find useful to explore:

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use!

edited May 28, 2019 at 14:36

Flux

11.1k6 gold badges64 silver badges112 bronze badges

answered Sep 25, 2011 at 15:51

David J.

32.9k24 gold badges131 silver badges177 bronze badges

Comments

Thomas Jungblut · Accepted Answer · 2011-02-26 16:53:49Z

4

Hadoop is exacly what you need: Apache Hadoop

answered Feb 26, 2011 at 16:53

Thomas Jungblut

21k6 gold badges71 silver badges92 bronze badges

Comments

Flux · Accepted Answer · 2019-05-28 14:12:24Z

3

Storm might suit your needs better than Hadoop, as it has no distributed data storage and has low latency. It's possible to split up and process data, similar to MapReduce, the Trident api makes this very simple.

It is partly written in Clojure, so I suppose Clojure interop is easier.

Another option is Onyx which offers similar functionality, but is a pure Clojure based project.

edited May 28, 2019 at 14:12

Flux

11.1k6 gold badges64 silver badges112 bronze badges

answered Sep 7, 2012 at 15:26

ChrisBlom

1,28113 silver badges18 bronze badges

Collectives™ on Stack Overflow

Distributed computing framework for Clojure/Java

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related