12

What are efficient ways to sort arrays that have mostly a small set of duplicated elements? That is, a list like:

{ 10, 10, 55, 10, 999, 8851243, 10, 55, 55, 55, 10, 999, 8851243, 10 }

Assuming that the order of equal elements doesn't matter, what are good worst-case/average-case algorithms?

4
  • The worst-case for all of these will be the same as for normal sorting algorithms since you haven't defined how "duplicate" the list has to be. Of course, there may be ones that do better average-case. Commented Nov 18, 2011 at 5:28
  • I'd be tempted to try insert sort with a skip list Commented Nov 18, 2011 at 5:29
  • How small is "small"? If it's really only a dozen or two items, something simple like selection sort will be hard to beat. Commented Nov 18, 2011 at 5:42
  • How about quick sort with 3way partitioning? Commented Mar 11, 2015 at 3:46

5 Answers 5

19

In practice, you can first iterate through the array once and use a hash table the count the number of occurrences of the individual elements (this is O(n) where n = size of the list). Then take all the unique elements and sort them (this is O(k log k) where k = number of unique elements), and then expand this back to a list of n elements in O(n) steps, recovering the counts from the hash table. If k << n you save time.

Sign up to request clarification or add additional context in comments.

Comments

5

I would try Counting sort with some mapping function. Ie. you wont use the frequencies array of size equal to the range of elements, instead you would iterate over the array, write down distinct elements and use them in a mapping function to the array of frequencies.

This way the algorithm has only one extra iteration and a mapping function, which should work in a constant time (using some kind of hash table). The complexity of this approach would be O(n), which should be optimal.

1 Comment

I am surpried that this last answer has zero count of usefulness. it's the best answer here as it shows time complexity = O(n) and space complexity = O(k).
2

Not the best algorithm, but simple:
You can put everything in a trie, and have the leaves be counters. That should take O(n*m) where n is the number of elements and m is the size of the biggest element (typically that would be a constant, but not necessarily). Then pre-order traverse the tie, outputting counter elements of the current key when you hit a leaf. That should take only O(n+p) where p is the size of the trie, which should be tiny compared to n.

Comments

1

Implementation in C++ based on algo as suggested by @Antti Huima

  • Count frequencies and store in hashtable.
  • sort elements in hashtable.
  • overwrite input array with sorted elements depending on frequencies.
#include <unordered_map>
#include <map>
// Modifies input array to a sorted array
// Complexity: O(n+(k*log(k))) where 'k' = number of unique elements input array
template <typename Datatype>
void SortArrayWithDuplicates(std::vector<Datatype>& in_seq) {
  std::unordered_map<Datatype, int> key_counts_map;
  // Count freqs O(n)
  for (const auto& itr: in_seq)
      key_counts_map[itr] += 1;

  // Sort elements by inserting into a map O(k*log(k))
  std::map<Datatype, int> key_counts_sorted_map;
  for (auto const& itr: key_counts_map)
      key_counts_sorted_map.insert(std::make_pair(itr.first, itr.second));

  auto AlwaysTrue = [](Datatype i)->bool{return true;};
  auto seq_itr = std::begin(in_seq);
  // Update input sequence with new sorted values
  for (auto const& itr: key_counts_sorted_map) {
      std::replace_if(seq_itr, seq_itr+itr.second, AlwaysTrue, itr.first);
      seq_itr += itr.second;
  }
}

Comments

0

IMO Pidgeonhole sort is a good example for such data.

I will clarify a bit: if you know that quantity of the unique elements in the array is reasonable and you know that there are a lot of duplicates I would think of implementing something like counting sort but make list of "buckets" dynamic. After the first pass you will get rid of duplicates, then sort array without duplicates with some good sorting algorithm and then restore sorted array in a way like counting sort does.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.