Linear sorting algorithms

Question

I am a newbie to studying algorithms - nor am I a computer science grad.
However,while reading through the linear sorting non-comparison algorithms,I could understand that the radix sort is a extension of the counting sort.
What I am unclear about is the limitation of counting sort.
Why would I go for radix sort when counting sort seems to serve the purpose where ever I need to avoid a O(n*logn) comparison?
It does seem to be a much simpler implementation for sure.

Counting sort only works when the range of the possible values is reasonably tight, i.e. it wouldn't be very useful if the values you wanted to sort could be in the range 0 to 2^32 - 1 as you'd need many GB of RAM. — James
– James, Commented Jul 14, 2013 at 17:42
Counting sort 1, 2, 3 will run in very different time to 1,2, 783837. Comparison sorts are more predictable. — Peter Webb
– Peter Webb, Commented Jul 15, 2013 at 11:46

Eli Rose · Accepted Answer · 2013-07-14 18:23:15Z

3

Imagine someone gave you a list of integers to sort. You know nothing about it except that it contains integers.

If you're lucky, the list might contain numbers within a fairly tight bound. If you're sorting integers that are all between -100 and 100, creating an array with that size in order to do counting sort wouldn't be bad at all.

But if even one number is very large or very small, you now have to extend the bounds of the array in order to do counting sort on the whole input. If you really want to sort all possible integers (and you don't know the range of the values before you create the array, unless you go find it first), you would need to make an array of size 2 * max_int (for negative and positive integers).

Radix sort is good because you never need to create an array with size greater than the range of digits (0-9).

edited Jul 14, 2013 at 18:23

answered Jul 14, 2013 at 17:56

Eli Rose

7,12810 gold badges40 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DanielHsH · Accepted Answer · 2013-07-14 18:23:54Z

2

The counting sort algorithms (including Radix) are applicable only only for countable elements. Unfortunately real numbers are not countable, so you are unable to sort 'float' or 'double' values easily. Imagine that you need to sort a list of measured temperatures.

Now regarding countable amounts (like integers), you have a basic mistake assuming that getting an element from array is O(1). This is not true. When you have array of size N the cost of setting the pointer into this array is O(log(N)). In other words to access the element Array[i] you need to define the 'i' and for defining the value of 'i' you need to set log(i) bits. As long as N is small (say 200 for sorting values of between -100..100 using counting sort) we assume than log(N) is constant and neglect it. But if you want to sort integers than your counting array will be large (size of: 2*MAX_INT) log(2*MAX_INT) might be a large number (like 32). So imagine that you have an array of size 100: A[100] of integers. Using O(N*log(N)) sort requires O(100*log(100)) comparisons. But when using counting sort You create a counting array of huge size (Say 2^64 for 64 bit integers integers) Your total time is O(N*log(2^64)) which is actually more than O(100*log(100)). Crazy as it sounds this is true. And think about the fact that you need to set the entire counting array to zero before you start counting - that is 2^64 operations which is much more than entire O(100*log(100))... And also think about a huge memory waste...

In conclusion: Even if you have infinite amount of memory to use the running time is not really O(N). It is in fact the cost of zeroing the counting array and performing the count:

O(MAX_INT) + O(N*log(MAX_INT))

Usually This is much more than O(N*log(N)) for any reasonable N so counting sort is impractical. The only case when it is practical is when the range of the values is small (like -100..100) and

O(MAX_INT) + O(N*log(MAX_INT))

becomes O(200) + O(N*log(200)) ~ O(N)

Radix sort enables you to save some memory and the cost of zeroing the huge counting array, but still you don't really loose the log() factor because a number of range -X..X has log(X) digits and you are still have the log(MAX_INT) which is usually larger than log(N) , Where N is the size of the array you want to sort.

edited Jul 14, 2013 at 18:23

answered Jul 14, 2013 at 18:17

DanielHsH

4,4733 gold badges33 silver badges37 bronze badges

4 Comments

ChronoTrigger Over a year ago

"When you have array of size N the cost of setting the pointer into this array is O(log(N))". Could you explain further?

DanielHsH Over a year ago

Imagine you have an array of size 2^1000. In order to access it in place 'i' you have to define the 'i' variable. 'i' variable must be of at least 1000 bits for accessing such a long array. So you have at least 1000 actions (settings bits to 0 and zeros) for every access to the array. 1000 is exactly log(2^1000)

ChronoTrigger Over a year ago

Ok, you are talking about very big numbers then. But for more normal numbers, i.e. < 2^32, I think it is "safe" to assume that accessing a vector v[i] is O(1), because you are adding an address and an integer, which are independent of the length of the vector (i.e. N). If we want to deal with the big O at a bit level, we cannot even assume that an assignment is O(1), but this is not very useful to study the complexity of sorting algorithms.

DanielHsH Over a year ago

This is true but when you need to sort an array of 10,000 elements log(N) is a bit less than 13, so we can treat it as O(1) but still we remember the log(N). My point was that counting algorithm may use a huge amount of memory, so huge that the cost of setting a pointer becomes higher than the log(N) of the size of the original array. Note: For any practical usage in average programmers life - log(N) is bounded by the number 31 == log(MAX_INT) so we can always discard log(N) as a constant. But it is important to keep it when trying to understand the theory.

user2566092 · Accepted Answer · 2013-07-14 18:00:00Z

1

Counting sort has complexity O(max - min) where min,max are the min and max integers you want to sort. If this range is much larger than the size of the array you want to sort, then radix sort is better.

answered Jul 14, 2013 at 18:00

user2566092

4,68117 silver badges21 bronze badges

3 Comments

Boris Dalstein Over a year ago

More accurately, it has a space complexity O(max-min), but a time complexity O(size of array)

user2566092 Over a year ago

@Boris i always assume that allocating space takes linear time in the amount of space, just a personal preference of mine in complexity analysis.

Boris Dalstein Over a year ago

Good point :-) But well... the size n of the array to sort may still be much bigger than (max-min) ;-) . So time complexity is O(max-min+n) if allocating is linear, or O(n) if allocating is constant time (that may be possible, zero-initialization of a vector can be done in constant time with some tricks, if I'm remember well)

Community · Accepted Answer · 2017-05-23 12:01:06Z

1

I disagree with some of these answers. First Radix Sort can sort doubles and floats. I've done it, and it's still much faster than comparison sorts.

To the op, you can learn more by seeing this post that I wrote earlier. It will always be the best linear time sort.

How to improve on this implementation of the radix-sort?

edited May 23, 2017 at 12:01

CommunityBot

11 silver badge

answered Jul 15, 2013 at 6:40

Kevin Melkowski

4931 gold badge5 silver badges17 bronze badges

Comments

Aravind · Accepted Answer · 2013-07-14 17:54:45Z

0

When people talk about algorithms they usually express the algorithm's performance in time and memory requirements.
As you observe counting sort is great.It runs in linear time.
But it needs O(N) memory requirement as well.
When we look for algorithms, we often see this trade-off between memory and time complexity. By using more memory we can get a better running time.
So though counting sort has a better time complexity, it needs space proportional to the input size, which makes it impractical to use in most cases.
As a more serious issue, you need to know the range of numbers in input beforehand.Sure,it's simple and elegant to code it, but when it comes to practical use, it's limited.

edited Jul 14, 2013 at 17:54

answered Jul 14, 2013 at 17:49

Aravind

3,2094 gold badges25 silver badges37 bronze badges

Collectives™ on Stack Overflow

Linear sorting algorithms

5 Answers 5

Comments

4 Comments

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

4 Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related