How to parallelize a loop in Java

Question

In the following code, a local method is called on every element of a HashSet. If it returns a special value we halt the loop. Otherwise we add every return value to a new HashSet.

HashSet<Object> myHashSet=…; 
HashSet<Object> mySecondHashSet=…; 

for (Object s : myHashSet) {
    Object value = my_method(s);
    if(value==specialValue)
        return value; 
    else 
        mySecondHashSet.add(value);
 }

I’d like to parralelize this process. None of the objects in the HashSet have any objects in common (it’s a tree-like structure) so I know they can run without any synchonization issues. How do I modify the code such that each call of my_method(s) starts a new tread, and also that if one of the threads evaluates to the special values, all the threads halt without returning and the special value is returned?

It isn't exactly what you're asking for, because that isn't entirely clear; but did you want something like - Set<Object> mySecondHashSet = myHashSet.stream().parallel().map(x -> my_method(x)).collect(Collectors.toSet()); if (mySecondHashSet.contains(specialValue)) { return specialValue; } — Elliott Frisch
– Elliott Frisch, Commented Jul 2, 2017 at 16:09
Your claim that "no synchronization is needed" is incorrect. HashSet is not thread-safe; inserting into it in parallel risks lost updates or corruption. If you want to parallelize, you need either insert into a thread-safe set, or collect into thread-local sets and merge the sets. — Brian Goetz
– Brian Goetz, Commented Jul 2, 2017 at 16:23
ahh OK, I meant, none of the objects share any variables. Which data structure should I use instaed of HashSet ? — Rafa
– Rafa, Commented Jul 2, 2017 at 19:00

Artem Barger · Accepted Answer · 2017-07-02 19:50:43Z

1

Having in mind java 8, this could be relatively simple, while it won't preserve your initial code semantics:

In case all you need is to return special value once you hit it

if (myHashSet.parallelStream()
             .map(x -> method(x))
             .anyMatch(x -> x == specialValue)) {

    return specialValue;
}

If you need to keep transformed values until you meet the special value, you already got an answer from @Elliot in comments, while need to mention that semantic is not the same as your original code, since no orderer will be preserved.

While it yet to be checked, but I would expect following to be optimized and stop once it will hit wanted special value:

if (myHashSet.parallelStream()
             .anyMatch(x -> method(x) == specialValue)) {

    return specialValue;
}

edited Jul 2, 2017 at 19:50

answered Jul 2, 2017 at 16:13

Artem Barger

41.3k9 gold badges61 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Rafa Over a year ago

Will this code halt the execution as soon as the special value is encuentered?

Artem Barger Over a year ago

no, it will not. What is primary goal? To check for special value or to store all values until special one found? Or is it both?

Rafa Over a year ago

What happens is, if I find the special value, I don't need to continue the treatement. So it's more efficient to halt as soon as I find it. Continuing might eat away any gain brought about by parralization.

Artem Barger Over a year ago

yes, this is correct, but what is the reason for having mySecondHashSet? With parallel approach you might find special value a way before serial approach.

Rafa Over a year ago

because if I don't find the specail value, I need to keep them all.

JB Nizet · Accepted Answer · 2017-07-02 16:22:47Z

0

I would do that in two passes:

find if any of the transformed set elements matches the special value;
transform them to a Set.

Starting a new thread for each transformation is way too heavy, and will bring your machine to its knees (unless you have very few elements, in which case parallelizing is probably not worth the effort.

To avoid transforming the values twice with my_method, you can do the transformation lazily and memoize the result:

private class Memoized {
    private Object value;
    private Object transformed;
    private Function<Object, Object> transform;

    public Memoized(Object value, Function<Object, Object> transform) {
        this.value = value;
    }

    public Object getTransformed() {
        if (transformed == null) {
            transformed = transform.apply(value);
        }
        return transformed;
    }
}

And then you can use the following code:

Set<Memoized> memoizeds = 
    myHashSet.stream() // no need to go parallel here
             .map(o -> new Memoized(o, this::my_method))
             .collect(Collectors.toSet());

Optional<Memoized> matching = memoized.parallelStream()
    .filter(m -> m.getTransformed().equals(specialValue))
    .findAny();

if (matching.isPresent()) {
    return matching.get().getTransformed();
}

Set<Object> allTransformed = 
    memoized.parallelStream() 
            .map(m -> m.getTransformed())
            .collect(Collectors.toSet());

edited Jul 2, 2017 at 16:22

answered Jul 2, 2017 at 16:17

JB Nizet

694k94 gold badges1.3k silver badges1.3k bronze badges

5 Comments

Rafa Over a year ago

I see, my main objective here really was to optimize execution time. I don't need the memoization because I know for sure that no the same value can't be computed twice since the initial array contains no duplicates.

JB Nizet Over a year ago

You're missing the point of the memoization. It allows avoiding to call my_method twice for the same object (value). Once in the first pass, when checking if a transformed value matches with the special value, and once more in the second pass, when creating the Set.

Rafa Over a year ago

I still need two passes over my elementSet so O(2n). If I can't get a speedup from parallelization I might as well elave the for loop as it is.

JB Nizet Over a year ago

No, you're still missing the point. Thanks to the memoization, each element is the set is passed to my_method at most once. I also don't really understand why you think you wouldn't get a speedup from parallelization. Haven't you noticed that this solution uses parallel streams?

Rafa Over a year ago

I see. Thanks I'll implement this. I wonder, the size of the hash table varies a lot during the execution. Do you think this machinery os "too mcuh" for a small table? I could test the size and only use it above a treachold of size, using the for loop for smaller tables.

Collectives™ on Stack Overflow

How to parallelize a loop in Java

2 Answers 2

5 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related