4

This is probably a pretty easy question, but as I never worked with threads before I figured it would be best to ask instead of trying to find the optimal solution completely on my own.

I have a giant for loop that runs literally billions of times. On each on loop run, according to the current index, the program calculates a final result in the form of a number. I am only interested in storing the top result(or top x results), and its corresponding index.

My question is simple, what would be the right way running this loop in threads so it uses all the available CPUs/cores.

int topResultIndex;
double topResult = 0;

for (i=1; i < 1000000000; ++i) {
    double result = // some complicated calculation based on the current index
    if (result > topResult) {
        topResult = result;
        topResultIndex = i;
    }
}

The calculation is completely independent for each index, no resources are shared. topResultIndex and topResult will be obviously accessed by each thread though.

* Update: Both Giulio's and rolfl's solution are good, also very similar. Could only accept one of them as my answer.

5
  • 3
    Is the calculation independent for each index or are there going to be shared resources for the calculations? Commented Nov 3, 2013 at 2:12
  • The calculation is completely independent for each index Commented Nov 3, 2013 at 2:14
  • If the loop is CPU-bound, multithreading will increase its speed (by a factor given by Amdahl's law). It won't work if the bottleneck is the memory (because multithreading won't make the RAM run faster) Commented Nov 3, 2013 at 2:19
  • Well, when I run the loop on a dual core CPU I see the cpu usage going upto exactly 50%, so I can assume the CPU is indeed the bottleneck. Ofc, I'll only know for sure after actually running it in threads. Commented Nov 3, 2013 at 2:27
  • 2
    Would recommend looking also at docs.oracle.com/javase/tutorial/essential/concurrency/… Commented Nov 3, 2013 at 2:53

3 Answers 3

6

Let's assume that the result is computed by a calculateResult(long) method, which is private and static, and does not access any static field, (it can also be non-static, but still it must be thread-safe and concurrently-executable, hopefully thread-confined).

Then, I think this will do the dirty work:

public static class Response {
    int index;
    double result;
}

private static class MyTask implements Callable<Response> {
    private long from;
    private long to;

    public MyTask(long fromIndexInclusive, long toIndexExclusive) {
        this.from = fromIndexInclusive;
        this.to = toIndexExclusive;
    }

    public Response call() {
        int topResultIndex;
        double topResult = 0;

        for (long i = from; i < to; ++i) {
            double result = calculateResult(i);
            if (result > topResult) {
                topResult = result;
                topResultIndex = i;
            }
        }

        Response res = new Response();
        res.index = topResultIndex;
        res.result = topResult;
        return res;
    }
};

private static calculateResult(long index) { ... }

public Response interfaceMethod() {
    //You might want to make this static/shared/global
    ExecutorService svc = Executors.newCachedThreadPool();

    int chunks = Runtime.getRuntime().availableProcessors();
    long iterations = 1000000000;
    MyTask[] tasks = new MyTask[chunks];
    for (int i = 0; i < chunks; ++i) {
        //You'd better cast to double and round here
        tasks[i] = new MyTask(iterations / chunks * i, iterations / chunks * (i + 1));
    }

    List<Future<Response>> resp = svc.invokeAll(Arrays.asList(tasks));
    Iterator<Future<Response>> respIt = resp.iterator();

    //You'll have to handle exceptions here
    Response bestResponse = respIt.next().get();

    while (respIt.hasNext()) {
        Response r = respIt.next().get();
        if (r.result > bestResponse.result) {
            bestResponse = r;
        }
    }

    return bestResponse;
}

From my experience, this division in chunks is much faster that having a task for each index (especially if the computational load for each single index is small, like it probably is. By small, I mean less than half a second). It's a bit harder to code, though, because you need to make a 2-step maximization (first at chunk-level, then at a global level). With this, if the computation is purely cpu-based (does not push the ram too much) you should get a speedup almost equal to 80% the number of physical cores.

Sign up to request clarification or add additional context in comments.

2 Comments

What about the situation where calculateResult() consists in a network request that involves a bit of waiting (like a second or two). Would you assign a task for each index?
Still depends on how many requests you need to make. In general, I would make a 1:1 relationship between tasks and indices, but you don't want to engulf your OS with waiting threads. On a normal desktop/laptop, for network or hdd requests, I'd avoid spawning more than 20-30 threads from a single call. Also consider that the device you need to wait for has its own limits, putting too many requests on it won't help. Anyway, the number of tasks can be much greater than the number of CPUs, when the tasks are IO-bound
2

Apart from the observation that a C program with OpenMP or some other parallel computing extensions would be a better idea, the Java way to do it would be to create a 'Future' Task that calculates a subset of the problem:

private static final class Result {
   final int index;
   final double result;
   public Result (int index, double result) {
       this.result = result;
       this.index = index;
   }
}

// Calculate 10,000 values in each thead
int steps = 10000;
int cpucount = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(cpucount);
ArrayList<Future<Result>> results = new ArrayList<>();
for (int i = 0; i < 1000000000; i+= steps) {
    final int from = i;
    final int to = from + steps;
    results.add(service.submit(new Callable<Result>() {
        public Result call() {
              int topResultIndex = -1;
              double topResult = 0;
              for (int j = from; j < to; j++) {
                  // do complicated things with 'j'
                      double result = // some complicated calculation based on the current index
                      if (result > topResult) {
                          topResult = result;
                          topResultIndex = j;
                      }
              }
              return new Result(topResultIndex, topResult);
        }
    });
 }

 service.shutdown();
 while (!service.isTerminated()) {
     System.out.println("Waiting for threads to complete");
     service.awaitTermination(10, TimeUnit.SECONDS);
 }
 Result best = null;
 for (Future<Result> fut : results) {
    if (best == null || fut.result > best.result) {
       best = fut;
    }
 }

 System.out.printf("Best result is %f at index %d\n", best.result, best.index);

Future<Result>

3 Comments

Thank you very much for your in-depth solution. Do you think this will run much better in C/C++ ? My entire program is already written in java but I actually thought about writing this part of the program in c++, if it will indeed show a significant improvement.
Actually the call Runtime.getRuntime().availableProcessors() will return the number of possessors without any external libraries
Edited, thanks vandale. @SportySpice - it depends on the situation. Java JIT may be able to compete closely in your particular situation.
1

The easiest way would be to use an ExecutorService and submit your tasks as a Runnable or Callable. You can use Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()) to create an ExeuctorService that will use the same number of threads as there are processors.

2 Comments

Ok but what are my "tasks". Should I just divide the loop? for example if i'm on 2 processors one will run to from 0 to 500000000 and the other from 500000001 to 1000000000? or is there a more appropriate solution? Or do you mean that each loop run itself will be a new runnable? (doesn't sound very smart creating so many objects)
Your tasks are the long complicated calculation. Make a method like getResult(int index), and use that as your task.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.