2

I want to run an algorithm on large graphs concurrently, using multi-core parallelism. I have been working on it for a while, but haven't been able to come up with a good solution.

This is the naive algorithm:

W - a very large number
double weight = 0

while(weight < W)

    - v : get_random_node_from(Graph)

    - weight += calculate(v)
  • I looked into fork-and-join, but can't figure out a way to divide this problem into smaller subproblems.
  • Then I tried using Java 8 streams, for which I need to create a lambda expression. When I tried doing something like this:

double weight = 0 Callable<Object> task = () -> { can not update weight here, as it needs to be final }

My question is, is it possible to update a variable like weight in a lambda method? Or is there a better way in which this problem can be solved?

The closest I have got is by using ExecutorService, but run into the problems of synchronization.

------------EDIT--------------

Here is the detailed algorithm:

In a nutshell, what I am trying to do, is traverse a massive graph, perform an operation on randomly selected nodes(as long as weight < W) and update a global structure Index.

This is taking too long as it doesn't utilize the full power of the CPU.

Ideally, all threads/processes on multiple cores would perform the operations on the randomly selected nodes, and update the shared weight and Index.

Note: It doesn't matter if different threads pick up the same node, as it's random without replacement.

Algorithm:

function Serial () {

List<List<Integer>> I (shared data structure which I want to update)
double weight

//// Task which I want to parallelize

while(weight < W) {

    v : get_random_node_from(Graph)

    bfs(v, affected_nodes) ...// this will fill up affected_nodes by v

    foreach(affected_node in affected_nodes) {

         // update I related to affected_node
         // and do other computation
    }

    weight += affected_nodes.size()

}

///////// Parallelization ends here

use_index(I) // I is passed now to some other method(not important) to get further results

}

The important thing is, all threads update the same I and weight.

Thanks.

5
  • I'm not sure what you're trying to accomplish - if it's to calculate the weight of the entire graph, using Java streams, you would just have each element of the stream be a node of the graph, use map to map it to the weight, then sum the stream. That should be parellizable as well. Commented Jun 7, 2017 at 17:57
  • Let weight < W. You start several calculate(v), but after the first weight += calculate(v) weight becomes >=W. Should other calculation be cancelled, or they can finish their work? If they can finish their work, can and should they add their results to weight? Commented Jun 7, 2017 at 21:44
  • 1
    @AlexeiKaigorodov Whenever weight > W all the processes should stop. I will edit the question to add more details. Commented Jun 7, 2017 at 22:07
  • Callable to pass to executor need not to be a lambda. It can be of any type that implements Callable and so can access to whatever information. Commented Jun 8, 2017 at 8:13
  • Thank you for the suggestion. However, I already tried that, and ran into the synchronization issues(possibly) and get invalidOperationExceptions. Commented Jun 8, 2017 at 18:50

1 Answer 1

1

Well you could wrap that weight into an array of a single element, it's sort of a know trick for this kind of stuff; even done internally by java, like this:

weight[0] = weight[0] + calculate(v);

But there are problems with this, since you are going to run it in parallel. You will not get the result you want since weight[0] is not thread-safe. And you could use some sort of synchronization, but java already has a great solution for that : DoubleAdder that scales far better in contended environments (and multiple cpus).

A trivial and small example:

DoubleAdder weight = new DoubleAdder();

private static int calculate(int v) {
    return v + 1;
}


Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9)
            .parallel()
            .forEach(x -> {
                int y = calculate(x);
                weight.add(y);
            });

System.out.println(weight); // 54

Then there is the problem of the randomizer that you are going to choose for this: get_random_node_from(Graph). You need to get a random Node indeed, but at the same time you need to get all of them exactly once. But you might not need it if you can flatten all the nodes into a single List let's say.

The problem here is that Graphs are usually traversed in a recursive way, you don't know the exact size of it:

while(parent.hasChildren) {
     traverse children and so on...
}

This will parallelize bad under Streams, you can look yourself at Spliterators#spliteratorUnknownSize. It will grow arithmetically from 1024; that's why my suggestion of flattening the Nodes into a single List, with known size; that will parallelize much better.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.