5

I am attempting to solve the following assignment: I'm given an array of n elements. it is known that not all keys of the array are distinct, but it is given that we have k distinct elements (k<=n of course). the assignment is to do a stable sort of the array in O(n log(log n)) worst case while k=O(log n). I'm allowed to use O(n) extra memory.

My current solution is described below:


Create a hash table with chaining of size k that does the following: if the hash function tries to insert an element to a place that already has a value in it - it checks if they are equal - if they are it adds it to the list, if not it starts moving in the array until it finds a place that has that same value or an empty space(which ever comes first).

This way the lists in each place only contains elements with equal keys. the insertion to the hashtable is from start to finish in the original array so each list is stably sorted. Then sort the array in the hash table with mergeSort (for lists we treat the first element as just one and move it).

After we are done merge sorting we copy the elements back to the original array by order and whenever we meet a list we copy each element by order.

Here is what I'm not sure about:

Is it true to say that because the hash table is size k and we only have k different elements, uniform hashing promises that the amount of time the hash function will try to give different values the same place in the array is negligible and therefore it's build time complexity is O(n)?

Because if so it seems the algorithm's runtime is O(n + k log k) = O(n + log n*log(log n)). which is definitely better then O(n log k) which what was required.

3
  • Not a duplicate question but might be relevant to read for OP: stackoverflow.com/questions/17041726/… Commented Jun 12, 2018 at 22:17
  • Assuming a good hashing function, building the hash table is O(n). Is there a particular reason you selected merge sort? That's going to require O(n) additional memory (or possibly O(2*k) additional memory if you do it indirectly). Stability of the sorting algorithm is a non-issue because you've guaranteed that the keys are distinct. Commented Jun 12, 2018 at 22:22
  • If it is known that all the elements are not distinct then 'k<n' and not '<=' Commented Jun 12, 2018 at 23:50

2 Answers 2

2

I think you're on the right track with the hash table, but I don't think you should insert the elements in the hash table, then copy them out again. You should use the hash table only to count the number of elements for each distinct value.

Next you compute the starting index for each distinct value, by traversing all values in order and adding the previous element's count to its start index:

  • start index for element i = start index for element i-1 + count for element i-1.

This step requires sorting the k elements in the hash table, which amounts to O(k log k) = O(log n log log n) operations, much less than the O(n) for steps 1 and 3.

Finally, you traverse your input array again, look it up in the table, and find the location in the output array for it. You copy the element, and also increase the start index for elements of its value, so that the next element will be copied after it.

If the comparison values for the items are consecutive integers (or integers in some small range), then you can use an array instead of a hash table for counting.

This is called counting sort.

Sign up to request clarification or add additional context in comments.

6 Comments

But counting sort doesn't work so well when the things you're comparing are keys, rather than items. For example, imagine sorting a list of n transactions that belong to k accounts. Each transaction is different, but you're sorting by account number. I don't see how your counting sort example can work.
@JimMischel: Why can't you count account numbers? You count the number of transactions for each account number, then you determine starting indices for each account number in the output array, then you copy over the transactions to their sorted location. Counting sort is not only for integers, see the Wikipedia article linked.
Ahhhh ... I see. Re-reading your description, I see that you keep a "next index" position in the hash table, so that when you do a sequential scan of the list you can put each element in its place. Sorry, didn't read closely enough.
"traversing all values in order" seems to be doing some unaccounted work :). To get a guaranteed runtime, instead of a hash table, use a sorted array. To build the array, scan as with the hash table. Lookup in this array is worst case O(log k) (when all k distinct values have been found); there are n lookups, making a total of O(n log k) = O(n log(log n)). Newly discovered elements are added with insertion sort, there are k of them, so another O(k²) = O((log n)²) (which disappears asymptotically). Final scan to do counting sort is another O(n log(log n)).
@rici: A sorted array is also a good solution. The lookup is more expensive, but the worst case with hash tables is not pretty either. But do note that the “unaccounted work” is sorting k elements, so that cost also disappears asymptotically. I just focused on the stuff you need to do n times.
|
2

on the same note as here:

You create a binary tree, any tree: each node would be a list, of elements, with a distinct key.

now we iterate the array, and for each key we search for it in the tree, the search would take log(log(n)), as there is a maximum of log(n) distinct nodes in the tree. ( if the key doesnt exist we just add it as a node to the tree ).

so iterating the array would take O(n*log(log(n))) as we gave n elements.

finally, as this is a binary search tree, we can call in order, and would get the sorted order of the arrays.

and all that left is to combine them into single array. that would take O(n) time.

so we get O(n + nlog(log(n))=O(nlog(log(n)))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.