38

How can we do Parallel Programming in Java? Is there any special framework for that? How can we make the stuff work?

I will tell you guys what I need, think that I developed a web crawler and it crawls a lot of data from the internet. One crawling system will not make things work properly, so I need more systems working in parallel. If this is the case can I apply parallel computing? Can you guys give me an example?

9
  • 4
    Depends on what you mean by "parallel programming". Commented Jul 28, 2010 at 6:43
  • @Stephen : Working 2 or system for a process,so that the process will get completed fast Commented Jul 28, 2010 at 6:48
  • 4
    SIMD = single instruction multiple data = operations on vectors, which is for example performed on graphic cards. When you say parallel programing, it's a very vague term. See flynn's taxonomy at wikipedia. And by the way, threading support code should be in ANY book about Java basics. Commented Jul 28, 2010 at 7:27
  • 4
    Remember to respect robots.txt Commented Jul 28, 2010 at 8:16
  • 2
    Sorry, I don't understand you. I think that you want know how to use threads, but I'm not sure. And as I've written, thread basics are described in almost every Java manual... Commented Jul 28, 2010 at 11:36

19 Answers 19

18

If you are asking about pure parallel programming i.e. not concurrent programming then you should definitely try MPJExpress http://mpj-express.org/. It is a thread-safe implementation of mpiJava and it supports both distributed and shared memory models. I have tried it and found very reliable.

1 import mpi.*;  
2  
3 
/**  
4  * Compile:impl specific.  
5  * Execute:impl specific.  
6  */  
7  
8 public class Send {  
9 
10     public static void main(String[] args) throws Exception { 
11 
12         MPI.Init(args); 
13 
14         int rank = MPI.COMM_WORLD.Rank() ; //The current process.
15         int size = MPI.COMM_WORLD.Size() ; //Total number of processes
16         int peer ; 
17 
18         int buffer [] = new int[10]; 
19         int len = 1 ;
20         int dataToBeSent = 99 ; 
21         int tag = 100 ; 
22 
23         if(rank == 0) { 
24 
25             buffer[0] = dataToBeSent ; 
26             peer = 1 ; 
27             MPI.COMM_WORLD.Send(buffer, 0, len, MPI.INT, peer, tag) ; 
28             System.out.println("process <"+rank+"> sent a msg to "+ 29                                "process <"+peer+">") ; 
30 
31         } else if(rank == 1) { 
32 
33             peer = 0 ; 
34             Status status = MPI.COMM_WORLD.Recv(buffer, 0, buffer.length, 35                                                 MPI.INT, peer, tag); 
36             System.out.println("process <"+rank+"> recv'ed a msg\n"+ 37                                "\tdata   <"+buffer[0]    +"> \n"+ 38                                "\tsource <"+status.source+"> \n"+ 39                                "\ttag    <"+status.tag   +"> \n"+ 40                                "\tcount  <"+status.count +">") ; 
41 
42         } 
43 
44         MPI.Finalize(); 
45 
46     }  
47 
48 }

One of the most common functionalities provided by messaging libraries like MPJ Express is the support of point-to-point communication between executing processes. In this context, two processes belonging to the same communicator (for instance the MPI.COMM_WORLD communicator) may communicate with each other by sending and receiving messages. A variant of the Send() method is used to send the message from the sender process. On the other hand, the sent message is received by the receiver process by using a variant of the Recv() method. Both sender and receiver specify a tag that is used to find a matching incoming messages at the receiver side.

After initializing the MPJ Express library using the MPI.Init(args) method on line 12, the program obtains its rank and the size of the MPI.COMM_WORLD communicator. Both processes initialize an integer array of length 10 called buffer on line 18. The sender process—rank 0—stores a value of 10 in the first element of the msg array. A variant of the Send() method is used to send an element of the msg array to the receiver process.

The sender process calls the Send() method on line 27. The first three arguments are related to the data being sent. The sending bu!er—the bu!er array—is the first argument followed by 0 (o!set) and 1 (count). The data being sent is of MPI.INT type and the destination is 1 (peer variable); the datatype and destination are specified as fourth and fifth argument to the Send() method. The last and the sixth argument is the tag variable. A tag is used to identify messages at the receiver side. A message tag is typically an identifier of a particular message in a specific communicator. On the other hand the receiver process (rank 1) receives the message using the blocking receive method.

Sign up to request clarification or add additional context in comments.

5 Comments

Butt : Can u please say how can i implement it?
sure.I assume you have downloaded the multicore version of mpj-express.I have added a code snippet in my answer now. I have the documents for the API. you can pm me if you want those.
Butt : How can i PM you?? can you please gave me ur email address?
Whats your comment about JPPF and hadoop?
Yes I found out there is no functionality of PM. Which is rather bad.mail me at [email protected]
10

Java supports threads, thus you can have multi threaded Java application. I strongly recommend the Concurrent Programming in Java: Design Principles and Patterns book for that:

http://java.sun.com/docs/books/cp/

1 Comment

Threads are okay but in my opinion, but this needs to be said: THREADS ARE A DEAD END. Threads are only for trivial things like "liveliness" of the user interface while a long-running task proceeds concurrently, so in other words, 2 threads. Sure there are 12 core 24 thread workstations but the poster said he wants 1000s of threads. Use something like JPPF or may be Hadoop.
9

You want to look at the Java Parallel Processing Framework (JPPF)

Comments

7

You can have a look at Hadoop and Hadoop Wiki.This is an apache framework inspired by google's map-reduce.It enables you to do distributed computing using multiple systems.Many companies like Yahoo,Twitter use it(Sites Powered By Hadoop).Check this book for more information on how to use it Hadoop Book.

5 Comments

i need to know more about Hadoop,Can you help me out?
@alex:The truth is I just came across this framework 2day's before.So I don't have much idea about how to use it.I started reading the book (the one I gave u as link).I think it can give a good start.Any way if you have any problems feel free to ask.If I know I'll help.You can check out this link in SO with hadoop tags.stackoverflow.com/questions/tagged/hadoop .This might help you to understand problems faced by hadoop users in general.
@alex:Also from what I understand what you need is distribute the load of a system to many machines.I don't think by learning about thread or concurrency in java will help you to achieve this.I think hadoop is the only opensource,stable framework which allows you to do this as of now.It might take some time to learn a new framework but it's always better than 'reinventing the wheel'.
@alex:Cloudera is a distribution of hadoop.It helps you to configure your systems without much manual work(cloudera.com).
Is the book you linked to on RapidShare pirated??
3

In java parallel processing is done using threads which are part of the runtime library

The Concurrency Tutorial should answer a lot of questions on this topic if you're new to java and parallel programming.

Comments

3

As far as I know, on most operating systems the Threading mechanism of Java should be based on real kernel threads. This is good from the parallel programming prospective. Other languages like Python simply do some time multiplexing of the processor (namely, if you run a heavvy multithreaded application on a multiprocessor machine you'll see only one processor running).

You can easily find something just googling it: by example this is the first result for "java threading": http://download-llnw.oracle.com/javase/tutorial/essential/concurrency/

Basically it boils down to extend the Thread class, overload the "run" method with the code belonging to the other thread and call the "start" method on an instance of the class you extended.

Also if you need to make something thread safe, have a look to the synchronized methods.

2 Comments

@Decav : thks for the info ;)
Threads are a dead end because you can't go massively parallel. The poster said he wanted 1000s of systems.
3

This is the parallel programming resource I've been pointed to in the past:

http://www.jppf.org/

I have no idea whether its any good or not, just that someone recommended it a while ago.

Comments

2

I have heard about one at conference a few years ago - ParJava. But I'm not sure about the current status of the project.

1 Comment

I see you are new to StackOverflow and thought you may not know that people usually upvote for useful answers.
2

Read the section ón threads in the java tutorial. http://download-llnw.oracle.com/javase/tutorial/essential/concurrency/procthread.html

2 Comments

@AnderSen : Ok think that i had developed a app which have threads,how can it be operated from two systems?
A central master knowing what needs to be done, and a horde of slaves which gets work from the master and reports back?
2

java.util.concurrency package and the Brian Goetz book "Java concurrency in practice"

There is also a lot of resources here about parallel patterns by Ralph Johnson (one of the GoF design pattern author) : http://parlab.eecs.berkeley.edu/wiki/patterns/patterns

Comments

2

Is the Ateji PX parallel-for loop what you're looking for ? This will crawl all sites in parallel (notice the double bar next to the for keyword) :

for||(Site site : sites) {
  crawl(site);
}

If you need to compose the results of crawling, then you'll probably want to use a parallel comprehension, such as :

Set result = set for||{ crawl(site) | Site site : sites }

Further reading here : http://www.ateji.com/px/whitepapers/Ateji%20PX%20for%20Java%20v1.0.pdf

Comments

1

You might want to check out Hadoop. It's designed to have jobs running over an arbitrary amount of boxes and takes care of all the bookkeeping for you. It's inspired by Google's MapReduce and their related tools and so it even comes from web indexing.

Comments

1

Have you looked at this:

http://www.javacodegeeks.com/2013/02/java-7-forkjoin-framework-example.html?ModPagespeed=noscript

The Fork / Join Framework?

I am also trying to learn a bit about this.

Comments

1

Parallelism

Parallelism means that an application splits its tasks up into smaller subtasks which can be processed in parallel, for instance on multiple CPUs at the exact same time. enter image description here

Comments

0

you can use JCSP (http://www.cs.kent.ac.uk/projects/ofa/jcsp/) the library implements CSP (Communicating Sequential Processes) principles in Java, parallelisation is abstracted from thread level and you deal instead with processes.

Comments

0

Java SE 5 and 6 introduced a set of packages in java.util.concurrent.* which provide powerful concurrency building blocks. check this for more information. http://www.oracle.com/technetwork/articles/java/fork-join-422606.html

Comments

0

You might try Parallel Java 2 Library.

On the website Prof. Alan Kaminsky wrote:

Fast forward to 2013, when I started developing PJ2. Parallel computing had expanded far beyond what it was a decade earlier. Multicore parallel computers were equipped with many more CPU cores and much larger main memory, such that computations that used to require a whole cluster now could be done on a single multicore node. New kinds of parallel computing hardware had become commonplace, notably graphics processing unit (GPU) accelerators. Cloud computing services, such as Amazon's EC2, allowed anyone to run parallel programs on a virtual supercomputer with thousands of cores. New application areas for parallel computing had opened up, notably big data analytics. New parallel programming APIs had arisen, such as OpenCL and NVIDIA Corporation's CUDA for GPU parallel programming, and map-reduce frameworks like Apache's Hadoop for big data computing. To explore and take advantage of all these trends, I decided that a completely new Parallel Java 2 Library was needed.

In early 2013 when PJ2 wasn't yet available (although an earlier version was), I tried Java Parallel Processing Framework (JPPF). JPPF was okay but at first glance PJ2 looks interesting.

Comments

0

There is a library called Habanero-Java (HJ), developed at Rice University that was built using lambda expressions and can run on any Java 8 JVM.

HJ-lib integrates a wide range of parallel programming constructs (e.g., async tasks, futures, data-driven tasks, forall, barriers, phasers, transactions, actors) in a single programming model that enables unique combinations of these constructs (e.g., nested combinations of task and actor parallelism).

The HJ runtime is responsible for orchestrating the creation, execution, and termination of HJ tasks, and features both work-sharing and work-stealing schedulers. You can follow the tutorial to set it up on your computer.

Here is a simple HelloWorld example:

import static edu.rice.hj.Module1.*;

public class HelloWorld {

public static void main(final String[] args) {

    launchHabaneroApp(() -> {

        finish(() -> {
            async(() -> System.out.println("Hello World - 1!"));
            async(() -> System.out.println("Hello World - 2!"));
            async(() -> System.out.println("Hello World - 3!"));
            async(() -> System.out.println("Hello World - 4!"));
        });

    });
}}

Each async method runs in parallel with the other async methods, while the content within these methods run sequentially. The program doesn't continue until all code within the finish method complete.

2 Comments

Exception in thread "main" java.lang.RuntimeException: java.lang.Error: Calling function not instrumented . This error is coming on running this code. How to fix?
A caveat I forgot to mention is that HJ-Lib requires you to configure the javaagent option to run it. They provide services that allow Java programming language agents to instrument programs running on the JVM. I'm not sure about other IDEs, but in IntelliJ, you're going to have to go to Edit Configurations and add -javaagent:$MODULE_DIR$/lib/hjlib-cooperative-0.1.8.jar (or whatever .jar version you have) to VM options.
0

Short answer with example library

If you are interested in parallel processing using Java, I would recommend you to give a try to Hazelcast Jet.

No more words needed from my side. Just check the website and learn by their examples. It give you pretty solid background and imagination about what does it meen to process data paralelly.

https://jet.hazelcast.org/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.