3

Does anybody know if there is a way to return two values from Java with (close to) zero overhead? I'm only looking for two values - I have a couple use cases from processing an array of bytes (and need the return value and the next starting position) to trying to return a value with an error code to doing some ugliness with fixed-point calculations and need the whole and fractional part.

I'm not below some really ugly hacks. The function is small and Hotspot happily inlines it. So now, I just need to get Hotspot to basically elide any object creation or bit shifting.

If I restrict my returned values to ints, I 've tried to pack them into a long, but even after inlining, Hotspot cannot seem figure out that all the bit shifts and masks don't really do anything and it happily packs and unpacks the ints into the same values (clearly, a place where Hotspot's peephole optimizer needs help). But at least I'm not creating an object.

My more difficult case is when one of the items I need to return is a reference and the other is a long or another reference (for the int case, I think I can compress the OOP and use the bit packing described above).

Has anybody tried to get Hotspot to generate garbage-free code for this? Worst case right now is that I have to have a carry around an object and pass it in, but I'd like to keep it self contained. Thread Locals are expensive (hash lookups), and it needs to be reentrant.

5
  • 2
    concatenate them in a String with a fixed separator, create a composed object which you return, return a Set of objects, ... Commented Nov 17, 2015 at 19:46
  • 4
    @Stultuske all of those approaches generate garbage, especially concatenating your results into Strings. Commented Nov 17, 2015 at 19:47
  • Long shot (sort of), but have you considered not returning anything at all? You can always extract the code after the call (supposedly it's some kind of mapping or accumulation logic). While everybody probably agrees that HotSpot should be able to save the day regardless, simply representing the logic differently might nudge the optimizer in the right direction. (Obviously you wouldn't write the loop itself in CPS or you'd blow the stack; only the individual item processing code and result handling.) Commented Nov 18, 2015 at 0:31
  • @tne In some cases it is mapped across, in other cases (e.g., when doing fixed with calculations and needing to pass back both the value and number of decimal positions) it isn't. But I'm intrigued by your idea. Can you please me a little more specific? What would I turn into CPS? Commented Nov 18, 2015 at 14:51
  • You'd probably need to post a separate question with specific code to get a useful answer, but very generally: extract the code after the call and before the end of the loop to a separate class. You'll want to put the mapped dataset or accumulator in a field member "to retrieve later". You then inject an instance of that class into the object of the target method. Instead of returning results, the method will in turn call into that injected code with the results as arguments and return nothing (void). Once you leave the loop, you retrieve the results in the injected object from the caller. Commented Nov 19, 2015 at 1:34

3 Answers 3

7

-XX:+EliminateAllocations optimization (ON by default in Java 8) works fine for that.

Whenever you return new Pair(a, b) right at the end of the callee method and use the result immediately in the caller, JVM is very likely to do a scalar replacement if the callee is inlined.

A simple experiment shows there's nearly no overhead in returning an object. This is not only an efficient way, but also the most readable one.

Benchmark                        Mode  Cnt    Score   Error   Units
ReturnPair.manualInline         thrpt   30  127,713 ± 3,408  ops/us
ReturnPair.packToLong           thrpt   30  113,606 ± 1,807  ops/us
ReturnPair.pairObject           thrpt   30  126,881 ± 0,478  ops/us
ReturnPair.pairObjectAllocated  thrpt   30   92,477 ± 0,621  ops/us

The benchmark:

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.concurrent.ThreadLocalRandom;

@State(Scope.Benchmark)
public class ReturnPair {
    int counter;

    @Benchmark
    public void manualInline(Blackhole bh) {
        bh.consume(counter++);
        bh.consume(ThreadLocalRandom.current().nextInt());
    }

    @Benchmark
    public void packToLong(Blackhole bh) {
        long packed = getPacked();
        bh.consume((int) (packed >>> 32));
        bh.consume((int) packed);
    }

    @Benchmark
    public void pairObject(Blackhole bh) {
        Pair pair = getPair();
        bh.consume(pair.a);
        bh.consume(pair.b);
    }

    @Benchmark
    @Fork(jvmArgs = "-XX:-EliminateAllocations")
    public void pairObjectAllocated(Blackhole bh) {
        Pair pair = getPair();
        bh.consume(pair.a);
        bh.consume(pair.b);
    }

    public long getPacked() {
        int a = counter++;
        int b = ThreadLocalRandom.current().nextInt();
        return (long) a << 32 | (b & 0xffffffffL);
    }

    public Pair getPair() {
        int a = counter++;
        int b = ThreadLocalRandom.current().nextInt();
        return new Pair(a, b);
    }

    static class Pair {
        final int a;
        final int b;

        Pair(int a, int b) {
            this.a = a;
            this.b = b;
        }
    }
}
Sign up to request clarification or add additional context in comments.

5 Comments

Unless this is new, I tried this in Java 8 (around micro version 20, I think), and looking at the JMH assembly dumps showed that the object creation was not being elided. Maybe I just needed to be more clear for Hotspot (e.g, make sure the return creates the object with ctor values instead of creating it at the top of the func and filling in the instance vars). I'll definitely try it again though. Did you look at the assembly by any chance?
Does using final help at all in the final compilation pass? (AFAIK, it helps in the earlier tiers, but the last tier always seems to get the same place with out without final).
I'm actually shocked that Hotspot can't figure out the packing/unpacking is a NOP after inlining.
@JasonN Yes, I've looked into generated assembly to verify that allocation exists only in the last benchmark.final modifier on non-static fields does not usually affect performance, including this particular case. But I think it's a good practice anyway to mark fields final to indicate that such classes are used more like value types.
The compiler is able to inline the Pair instance creation because the object is simple enough. But when using more complex objects, like java.util.AbstractMap.SimpleEntry (which is basically also a pair of value), the compiler is not able to inline the instance creation, so the performance is even worse than in pairObjectAllocated(Blackhole).
0

Your described solution is pretty much as good as you can get in Hotspot -- passing in an object to hold the return values and mutating it. (Java 10 value types might do something better here, but I don't think that's even in the prototype stage yet.)

That said: small short-lived objects are not actually that far from zero-overhead. Garbage collection of short lived objects is deliberately extremely cheap.

3 Comments

I'm actually shocked the packing the ints into a long and then unpacked doesn't get optimized away once the function gets inlined. When I looked at the assembly dump with JMH, it literally shifts and ors the values together the immediately unshifts and masks. Sometimes Hotspot is magic, other times it's lame. Also, not cheap enough. While it is potentially just a TLB bump, it can be much more and it also has the bzero overhead.
@JasonN if you can do that, the shifting and unshifting is still likely to be cheaper than an object allocation, so that'd be your best option.
It definitely is. Just some of my return cases I can't seem to find a way to do that. Compressing two oops and packing them into a long and then unpacking and uncompressing the other side is also doable, but still has about a 20 instruction cost (at least). Not to mention there is no way to library off that code sicne the library code would need to return two values, of course :)
0

I've had to deal with that problem, and found that the best way is to instantiate a simple final class with public fields, then pass it in parameter to your method. Push the results to that instance.

If you have a loop, try to reuse that instance for as long as possible.

Having setters and getters, in Java 7 (when I did this), had a very small overhead. Same goes with instantiate new objects every loop.

1 Comment

Getters and setters (that are never overridden) should have zero overhead after Hotspot gets through with them (which might not happen until the last compilation pass when using tiers). It happily inlines them away. That said, I never use getters and setters (I always convert them to protected or public).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.