Java - Multithreading one big loop

Question

This is probably a pretty easy question, but as I never worked with threads before I figured it would be best to ask instead of trying to find the optimal solution completely on my own.

I have a giant for loop that runs literally billions of times. On each on loop run, according to the current index, the program calculates a final result in the form of a number. I am only interested in storing the top result(or top x results), and its corresponding index.

My question is simple, what would be the right way running this loop in threads so it uses all the available CPUs/cores.

int topResultIndex;
double topResult = 0;

for (i=1; i < 1000000000; ++i) {
    double result = // some complicated calculation based on the current index
    if (result > topResult) {
        topResult = result;
        topResultIndex = i;
    }
}

The calculation is completely independent for each index, no resources are shared. topResultIndex and topResult will be obviously accessed by each thread though.

* Update: Both Giulio's and rolfl's solution are good, also very similar. Could only accept one of them as my answer.

Is the calculation independent for each index or are there going to be shared resources for the calculations? — Jeffrey
– Jeffrey, Commented Nov 3, 2013 at 2:12
If the loop is CPU-bound, multithreading will increase its speed (by a factor given by Amdahl's law). It won't work if the bottleneck is the memory (because multithreading won't make the RAM run faster) — Giulio Franco
– Giulio Franco, Commented Nov 3, 2013 at 2:19
Well, when I run the loop on a dual core CPU I see the cpu usage going upto exactly 50%, so I can assume the CPU is indeed the bottleneck. Ofc, I'll only know for sure after actually running it in threads. — SportySpice
– SportySpice, Commented Nov 3, 2013 at 2:27
Would recommend looking also at docs.oracle.com/javase/tutorial/essential/concurrency/… — vandale
– vandale, Commented Nov 3, 2013 at 2:53

Giulio Franco · Accepted Answer · 2013-11-03 02:47:01Z

6

Let's assume that the result is computed by a calculateResult(long) method, which is private and static, and does not access any static field, (it can also be non-static, but still it must be thread-safe and concurrently-executable, hopefully thread-confined).

Then, I think this will do the dirty work:

public static class Response {
    int index;
    double result;
}

private static class MyTask implements Callable<Response> {
    private long from;
    private long to;

    public MyTask(long fromIndexInclusive, long toIndexExclusive) {
        this.from = fromIndexInclusive;
        this.to = toIndexExclusive;
    }

    public Response call() {
        int topResultIndex;
        double topResult = 0;

        for (long i = from; i < to; ++i) {
            double result = calculateResult(i);
            if (result > topResult) {
                topResult = result;
                topResultIndex = i;
            }
        }

        Response res = new Response();
        res.index = topResultIndex;
        res.result = topResult;
        return res;
    }
};

private static calculateResult(long index) { ... }

public Response interfaceMethod() {
    //You might want to make this static/shared/global
    ExecutorService svc = Executors.newCachedThreadPool();

    int chunks = Runtime.getRuntime().availableProcessors();
    long iterations = 1000000000;
    MyTask[] tasks = new MyTask[chunks];
    for (int i = 0; i < chunks; ++i) {
        //You'd better cast to double and round here
        tasks[i] = new MyTask(iterations / chunks * i, iterations / chunks * (i + 1));
    }

    List<Future<Response>> resp = svc.invokeAll(Arrays.asList(tasks));
    Iterator<Future<Response>> respIt = resp.iterator();

    //You'll have to handle exceptions here
    Response bestResponse = respIt.next().get();

    while (respIt.hasNext()) {
        Response r = respIt.next().get();
        if (r.result > bestResponse.result) {
            bestResponse = r;
        }
    }

    return bestResponse;
}

From my experience, this division in chunks is much faster that having a task for each index (especially if the computational load for each single index is small, like it probably is. By small, I mean less than half a second). It's a bit harder to code, though, because you need to make a 2-step maximization (first at chunk-level, then at a global level). With this, if the computation is purely cpu-based (does not push the ram too much) you should get a speedup almost equal to 80% the number of physical cores.

answered Nov 3, 2013 at 2:47

Giulio Franco

3,25017 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

splinter123 Over a year ago

What about the situation where calculateResult() consists in a network request that involves a bit of waiting (like a second or two). Would you assign a task for each index?

Giulio Franco Over a year ago

Still depends on how many requests you need to make. In general, I would make a 1:1 relationship between tasks and indices, but you don't want to engulf your OS with waiting threads. On a normal desktop/laptop, for network or hdd requests, I'd avoid spawning more than 20-30 threads from a single call. Also consider that the device you need to wait for has its own limits, putting too many requests on it won't help. Anyway, the number of tasks can be much greater than the number of CPUs, when the tasks are IO-bound

rolfl · Accepted Answer · 2013-11-03 02:48:41Z

2

Apart from the observation that a C program with OpenMP or some other parallel computing extensions would be a better idea, the Java way to do it would be to create a 'Future' Task that calculates a subset of the problem:

private static final class Result {
   final int index;
   final double result;
   public Result (int index, double result) {
       this.result = result;
       this.index = index;
   }
}

// Calculate 10,000 values in each thead
int steps = 10000;
int cpucount = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(cpucount);
ArrayList<Future<Result>> results = new ArrayList<>();
for (int i = 0; i < 1000000000; i+= steps) {
    final int from = i;
    final int to = from + steps;
    results.add(service.submit(new Callable<Result>() {
        public Result call() {
              int topResultIndex = -1;
              double topResult = 0;
              for (int j = from; j < to; j++) {
                  // do complicated things with 'j'
                      double result = // some complicated calculation based on the current index
                      if (result > topResult) {
                          topResult = result;
                          topResultIndex = j;
                      }
              }
              return new Result(topResultIndex, topResult);
        }
    });
 }

 service.shutdown();
 while (!service.isTerminated()) {
     System.out.println("Waiting for threads to complete");
     service.awaitTermination(10, TimeUnit.SECONDS);
 }
 Result best = null;
 for (Future<Result> fut : results) {
    if (best == null || fut.result > best.result) {
       best = fut;
    }
 }

 System.out.printf("Best result is %f at index %d\n", best.result, best.index);

Future<Result>

edited Nov 3, 2013 at 2:48

answered Nov 3, 2013 at 2:34

rolfl

17.7k7 gold badges45 silver badges77 bronze badges

3 Comments

SportySpice Over a year ago

Thank you very much for your in-depth solution. Do you think this will run much better in C/C++ ? My entire program is already written in java but I actually thought about writing this part of the program in c++, if it will indeed show a significant improvement.

vandale Over a year ago

Actually the call Runtime.getRuntime().availableProcessors() will return the number of possessors without any external libraries

rolfl Over a year ago

Edited, thanks vandale. @SportySpice - it depends on the situation. Java JIT may be able to compete closely in your particular situation.

Jeffrey · Accepted Answer · 2013-11-03 02:26:30Z

1

The easiest way would be to use an ExecutorService and submit your tasks as a Runnable or Callable. You can use Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()) to create an ExeuctorService that will use the same number of threads as there are processors.

answered Nov 3, 2013 at 2:26

Jeffrey

44.9k8 gold badges93 silver badges143 bronze badges

2 Comments

SportySpice Over a year ago

Ok but what are my "tasks". Should I just divide the loop? for example if i'm on 2 processors one will run to from 0 to 500000000 and the other from 500000001 to 1000000000? or is there a more appropriate solution? Or do you mean that each loop run itself will be a new runnable? (doesn't sound very smart creating so many objects)

Jeffrey Over a year ago

Your tasks are the long complicated calculation. Make a method like getResult(int index), and use that as your task.

Collectives™ on Stack Overflow

Java - Multithreading one big loop

3 Answers 3

2 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related