I am a newbie to studying algorithms - nor am I a computer science grad.
However,while reading through the linear sorting non-comparison algorithms,I could understand that the radix sort is a extension of the counting sort.
What I am unclear about is the limitation of counting sort.
Why would I go for radix sort when counting sort seems to serve the purpose where ever I need to avoid a O(n*logn) comparison?
It does seem to be a much simpler implementation for sure.
-
1Counting sort only works when the range of the possible values is reasonably tight, i.e. it wouldn't be very useful if the values you wanted to sort could be in the range 0 to 2^32 - 1 as you'd need many GB of RAM.James– James2013-07-14 17:42:20 +00:00Commented Jul 14, 2013 at 17:42
-
Counting sort 1, 2, 3 will run in very different time to 1,2, 783837. Comparison sorts are more predictable.Peter Webb– Peter Webb2013-07-15 11:46:14 +00:00Commented Jul 15, 2013 at 11:46
5 Answers
Imagine someone gave you a list of integers to sort. You know nothing about it except that it contains integers.
If you're lucky, the list might contain numbers within a fairly tight bound. If you're sorting integers that are all between -100 and 100, creating an array with that size in order to do counting sort wouldn't be bad at all.
But if even one number is very large or very small, you now have to extend the bounds of the array in order to do counting sort on the whole input. If you really want to sort all possible integers (and you don't know the range of the values before you create the array, unless you go find it first), you would need to make an array of size 2 * max_int (for negative and positive integers).
Radix sort is good because you never need to create an array with size greater than the range of digits (0-9).
Comments
The counting sort algorithms (including Radix) are applicable only only for countable elements. Unfortunately real numbers are not countable, so you are unable to sort 'float' or 'double' values easily. Imagine that you need to sort a list of measured temperatures.
Now regarding countable amounts (like integers), you have a basic mistake assuming that getting an element from array is O(1). This is not true. When you have array of size N the cost of setting the pointer into this array is O(log(N)). In other words to access the element Array[i] you need to define the 'i' and for defining the value of 'i' you need to set log(i) bits. As long as N is small (say 200 for sorting values of between -100..100 using counting sort) we assume than log(N) is constant and neglect it. But if you want to sort integers than your counting array will be large (size of: 2*MAX_INT) log(2*MAX_INT) might be a large number (like 32). So imagine that you have an array of size 100: A[100] of integers. Using O(N*log(N)) sort requires O(100*log(100)) comparisons. But when using counting sort You create a counting array of huge size (Say 2^64 for 64 bit integers integers) Your total time is O(N*log(2^64)) which is actually more than O(100*log(100)). Crazy as it sounds this is true. And think about the fact that you need to set the entire counting array to zero before you start counting - that is 2^64 operations which is much more than entire O(100*log(100))... And also think about a huge memory waste...
In conclusion: Even if you have infinite amount of memory to use the running time is not really O(N). It is in fact the cost of zeroing the counting array and performing the count:
O(MAX_INT) + O(N*log(MAX_INT))
Usually This is much more than O(N*log(N)) for any reasonable N so counting sort is impractical. The only case when it is practical is when the range of the values is small
(like -100..100) and
O(MAX_INT) + O(N*log(MAX_INT))
becomes O(200) + O(N*log(200)) ~ O(N)
Radix sort enables you to save some memory and the cost of zeroing the huge counting array, but still you don't really loose the log() factor because a number of range -X..X has log(X) digits and you are still have the log(MAX_INT) which is usually larger than log(N) , Where N is the size of the array you want to sort.
4 Comments
Counting sort has complexity O(max - min) where min,max are the min and max integers you want to sort. If this range is much larger than the size of the array you want to sort, then radix sort is better.
3 Comments
I disagree with some of these answers. First Radix Sort can sort doubles and floats. I've done it, and it's still much faster than comparison sorts.
To the op, you can learn more by seeing this post that I wrote earlier. It will always be the best linear time sort.
Comments
When people talk about algorithms they usually express the algorithm's performance in time and memory requirements.
As you observe counting sort is great.It runs in linear time.
But it needs O(N) memory requirement as well.
When we look for algorithms, we often see this trade-off between memory and time complexity.
By using more memory we can get a better running time.
So though counting sort has a better time complexity, it needs space proportional to the input size, which makes it impractical to use in most cases.
As a more serious issue, you need to know the range of numbers in input beforehand.Sure,it's simple and elegant to code it, but when it comes to practical use, it's limited.