2

I am working on a list of integers from a file. And I have to use a sorting algorithm to categorize them in descending order. I am familiar with the run time of a few sorting algorithms and I know the use of them is situational. So my question is: What would be the quickest sorting algorithm for a list of ANY size that is already 90% sorted? (in my file I have 10.000 entries but 9.500 of them are already sorted).

Thank you,

13
  • As a start, why don't you benchmark some? Insertion sort works well for small nearly sorted lists..but is it the fastest? Commented Aug 25, 2013 at 1:34
  • my personal any case favorite is an introsort algorythm, if you know there is a pretty high percent of almost sorted lists, you can just set up the heapsort walk in. Commented Aug 25, 2013 at 1:36
  • 3
    Depending on the platform, the default sort may already be adaptive. For example, Python and Java 7 use Timsort, which is a mergesort that takes advantage of runs already present in the input. Check if you can just use an appropriate library function. Commented Aug 25, 2013 at 1:44
  • 2
    Insertion Sort seems to be popular for this type of input. stackoverflow.com/questions/1513566/… stackoverflow.com/questions/220044/… Commented Aug 25, 2013 at 1:45
  • 1
    a good link to see your self which sorting algorithm is best: which in my opinion is insert sorting. Go to link: sorting-algorithms.com Commented Aug 25, 2013 at 5:35

3 Answers 3

5

The Timsort algorithm as developed for Python, (and now used in Java), has optimisations to deal with partially sorted "runs" built in.

Sign up to request clarification or add additional context in comments.

4 Comments

The following post on Timsort in C# has statistics comparing Timsort to C#'s native modified Quicksort routine for random and partially sorted data as well as the C# code for Timsort: timsort4net.codeplex.com
I recently benchmarked the C# Timsort implementation from timsort4net.codeplex.com and was surprised to find it about 3x slower than .NET 4.0's default Array.Sort() call for all array sizes > 64. This held even over varying degrees of array disorder from about 10% random to 100% random. Both .NET and Timsort times reduced with less initial disorder, but the ratio remained about 1:3 in all cases. There may be one or more serious implementation faults with this code.
It turns out I was actually benchmarking against the .NET 4.5 introspective sort, which is an improvement over earlier implementations. So my benchmarks showed the .NET 4.5 introspective sort to be about 3x faster than the timsort4net.codeplex.com implementation.
4

Insertion sort should be fine, if you choose to code it yourself rather than using the language provided sort functions.

  1. It performs really well when the list is almost sorted (Though it is of Order O(N^2), if the list is almost sorted, the inner loop will not be executed often)
  2. It is pretty easy to implement.

Presenting the Python implementation of Insertion Sort.

Program

def InsertionSort(Array):
    Total = 0
    for i in xrange(1, len(Array)):
        j, I, = i - 1, i
        while j >= 0 and Array[I] > Array[j]:
            Array[I], Array[j] = Array[j], Array[I]
            j, I, Total = j - 1, I - 1, Total + 1
    print "Insertion Sort Total Operations :", Total

Output

Worstcase

TestArray  = range(1, 11)
InsertionSort(TestArray)
Insertion Sort Total Operations : 45

Bestcase

TestArray  = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
InsertionSort(TestArray)
Insertion Sort Total Operations : 0

90% Sorted Array

TestArray  = [1, 9, 8, 7, 6, 5, 4, 3, 2, 10]
InsertionSort(TestArray)
Insertion Sort Total Operations : 17

Half Sorted Array

TestArray  = [10, 9, 8, 7, 6, 1, 2, 3, 4, 5]
InsertionSort(TestArray)
Insertion Sort Total Operations : 10

1 Comment

But, of course, the above Python code given should only serve as pseudo code to aid in implementing insertion sort in other languages. In Python use the built-in sort which uses Timsort, which is even better for partially sorted data.
0

For its std::sort C++ uses introspective sort, where the array/list is first sorted using quicksort for a certain depth of recursion followed by heapsort. I dunno about 90% but heapsort seems to perform well on already sorted arrays/lists... so I'd recommend you try that.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.