0

In the following code, a local method is called on every element of a HashSet. If it returns a special value we halt the loop. Otherwise we add every return value to a new HashSet.

HashSet<Object> myHashSet=…; 
HashSet<Object> mySecondHashSet=…; 

for (Object s : myHashSet) {
    Object value = my_method(s);
    if(value==specialValue)
        return value; 
    else 
        mySecondHashSet.add(value);
 }

I’d like to parralelize this process. None of the objects in the HashSet have any objects in common (it’s a tree-like structure) so I know they can run without any synchonization issues. How do I modify the code such that each call of my_method(s) starts a new tread, and also that if one of the threads evaluates to the special values, all the threads halt without returning and the special value is returned?

3
  • It isn't exactly what you're asking for, because that isn't entirely clear; but did you want something like - Set<Object> mySecondHashSet = myHashSet.stream().parallel().map(x -> my_method(x)).collect(Collectors.toSet()); if (mySecondHashSet.contains(specialValue)) { return specialValue; } Commented Jul 2, 2017 at 16:09
  • 1
    Your claim that "no synchronization is needed" is incorrect. HashSet is not thread-safe; inserting into it in parallel risks lost updates or corruption. If you want to parallelize, you need either insert into a thread-safe set, or collect into thread-local sets and merge the sets. Commented Jul 2, 2017 at 16:23
  • ahh OK, I meant, none of the objects share any variables. Which data structure should I use instaed of HashSet ? Commented Jul 2, 2017 at 19:00

2 Answers 2

1

Having in mind java 8, this could be relatively simple, while it won't preserve your initial code semantics:

In case all you need is to return special value once you hit it

if (myHashSet.parallelStream()
             .map(x -> method(x))
             .anyMatch(x -> x == specialValue)) {

    return specialValue;
}

If you need to keep transformed values until you meet the special value, you already got an answer from @Elliot in comments, while need to mention that semantic is not the same as your original code, since no orderer will be preserved.


While it yet to be checked, but I would expect following to be optimized and stop once it will hit wanted special value:

if (myHashSet.parallelStream()
             .anyMatch(x -> method(x) == specialValue)) {

    return specialValue;
}
Sign up to request clarification or add additional context in comments.

5 Comments

Will this code halt the execution as soon as the special value is encuentered?
no, it will not. What is primary goal? To check for special value or to store all values until special one found? Or is it both?
What happens is, if I find the special value, I don't need to continue the treatement. So it's more efficient to halt as soon as I find it. Continuing might eat away any gain brought about by parralization.
yes, this is correct, but what is the reason for having mySecondHashSet? With parallel approach you might find special value a way before serial approach.
because if I don't find the specail value, I need to keep them all.
0

I would do that in two passes:

  1. find if any of the transformed set elements matches the special value;
  2. transform them to a Set.

Starting a new thread for each transformation is way too heavy, and will bring your machine to its knees (unless you have very few elements, in which case parallelizing is probably not worth the effort.

To avoid transforming the values twice with my_method, you can do the transformation lazily and memoize the result:

private class Memoized {
    private Object value;
    private Object transformed;
    private Function<Object, Object> transform;

    public Memoized(Object value, Function<Object, Object> transform) {
        this.value = value;
    }

    public Object getTransformed() {
        if (transformed == null) {
            transformed = transform.apply(value);
        }
        return transformed;
    }
}

And then you can use the following code:

Set<Memoized> memoizeds = 
    myHashSet.stream() // no need to go parallel here
             .map(o -> new Memoized(o, this::my_method))
             .collect(Collectors.toSet());

Optional<Memoized> matching = memoized.parallelStream()
    .filter(m -> m.getTransformed().equals(specialValue))
    .findAny();

if (matching.isPresent()) {
    return matching.get().getTransformed();
}

Set<Object> allTransformed = 
    memoized.parallelStream() 
            .map(m -> m.getTransformed())
            .collect(Collectors.toSet());

5 Comments

I see, my main objective here really was to optimize execution time. I don't need the memoization because I know for sure that no the same value can't be computed twice since the initial array contains no duplicates.
You're missing the point of the memoization. It allows avoiding to call my_method twice for the same object (value). Once in the first pass, when checking if a transformed value matches with the special value, and once more in the second pass, when creating the Set.
I still need two passes over my elementSet so O(2n). If I can't get a speedup from parallelization I might as well elave the for loop as it is.
No, you're still missing the point. Thanks to the memoization, each element is the set is passed to my_method at most once. I also don't really understand why you think you wouldn't get a speedup from parallelization. Haven't you noticed that this solution uses parallel streams?
I see. Thanks I'll implement this. I wonder, the size of the hash table varies a lot during the execution. Do you think this machinery os "too mcuh" for a small table? I could test the size and only use it above a treachold of size, using the for loop for smaller tables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.