5

I saw a presentation the other day in which the speaker had used the technique outlined in McIlroy's paper A Killer Adversary for Quicksort to generate an input to Arrays.sort for primitive types that would trigger O(n2) behavior. The sequence caused the pivot choice to always only reduce the array size by a constant, which caused the Java Arrays.sort function to cause a stack overflow.

According to the source files from the JDK, the Arrays.sort1 quicksort implementation function has no guarding to prevent stack overflows. It is always possible to make quicksort never stack overflow by having the sorting routine not fire off two recursive calls, but instead use a while loop to reuse the current stack frame for the larger subarray and only recursing once (on the smaller subarray). This causes minimal performance degradation and makes it impossible to cause a stack overflow for any reasonably-sized input, since the stack depth never exceeds O(log n) stack frames on an input of size n. The authors also could have used the introsort algorithm, which modifies quicksort to switch to a worst-case O(n log n) sorting algorithm when the quicksort recursion depth exceeds some limit, to prevent this.

Is there any reason why the authors of Arrays.sort didn't opt to do this? It seems like a serious problem that a built-in sorting algorithm can cause a stack overflow, as it makes it possible to launch a DoS attack against such a system by triggering repeated stack overflows.

1
  • To know for sure we need to ask Vladimir Yaroslavskiy, Jon Bentley or Josh Bloch. However in java 1.7 the sort1 method is removed and replaced with a DualPivotQuicksort but I am not good enough at this stuff to understand if this is better as the old approach. Commented May 10, 2013 at 23:35

1 Answer 1

5

Why? Because solving the problem would be overkill.

The algorithm used will be stable in all but exceptionally unusual circumstances and if those circumstances are more than usually likely to occurr then the situation will be guarded against externally. That is why they have API documentation that defines the algorithm used behind the scenes. So you can defend against it.

The chances of the specific order that breaks the algorithm being presented is vanishingly small.

I expect if you looked carefully enough there would be datasets that cause almost all of the standard JVM structures to break. What is the cost of protecting against them and is that cost worth the effort and the inevitable degradation of the algorithm due to the defensive measures.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.