Sorting an array in O(n log(log n))

Question

I am attempting to solve the following assignment: I'm given an array of n elements. it is known that not all keys of the array are distinct, but it is given that we have k distinct elements (k<=n of course). the assignment is to do a stable sort of the array in O(n log(log n)) worst case while k=O(log n). I'm allowed to use O(n) extra memory.

My current solution is described below:

Create a hash table with chaining of size k that does the following: if the hash function tries to insert an element to a place that already has a value in it - it checks if they are equal - if they are it adds it to the list, if not it starts moving in the array until it finds a place that has that same value or an empty space(which ever comes first).

This way the lists in each place only contains elements with equal keys. the insertion to the hashtable is from start to finish in the original array so each list is stably sorted. Then sort the array in the hash table with mergeSort (for lists we treat the first element as just one and move it).

After we are done merge sorting we copy the elements back to the original array by order and whenever we meet a list we copy each element by order.

Here is what I'm not sure about:

Is it true to say that because the hash table is size k and we only have k different elements, uniform hashing promises that the amount of time the hash function will try to give different values the same place in the array is negligible and therefore it's build time complexity is O(n)?

Because if so it seems the algorithm's runtime is O(n + k log k) = O(n + log n*log(log n)). which is definitely better then O(n log k) which what was required.

Not a duplicate question but might be relevant to read for OP: stackoverflow.com/questions/17041726/… — Sash Sinha
– Sash Sinha, Commented Jun 12, 2018 at 22:17
Assuming a good hashing function, building the hash table is O(n). Is there a particular reason you selected merge sort? That's going to require O(n) additional memory (or possibly O(2*k) additional memory if you do it indirectly). Stability of the sorting algorithm is a non-issue because you've guaranteed that the keys are distinct. — Jim Mischel
– Jim Mischel, Commented Jun 12, 2018 at 22:22
If it is known that all the elements are not distinct then 'k<n' and not '<=' — Samer Tufail
– Samer Tufail, Commented Jun 12, 2018 at 23:50

Cris Luengo · Accepted Answer · 2018-06-12 23:43:25Z

2

I think you're on the right track with the hash table, but I don't think you should insert the elements in the hash table, then copy them out again. You should use the hash table only to count the number of elements for each distinct value.

Next you compute the starting index for each distinct value, by traversing all values in order and adding the previous element's count to its start index:

start index for element i = start index for element i-1 + count for element i-1.

This step requires sorting the k elements in the hash table, which amounts to O(k log k) = O(log n log log n) operations, much less than the O(n) for steps 1 and 3.

Finally, you traverse your input array again, look it up in the table, and find the location in the output array for it. You copy the element, and also increase the start index for elements of its value, so that the next element will be copied after it.

If the comparison values for the items are consecutive integers (or integers in some small range), then you can use an array instead of a hash table for counting.

This is called counting sort.

edited Jun 12, 2018 at 23:43

answered Jun 12, 2018 at 22:15

Cris Luengo

61.4k10 gold badges75 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Jim Mischel Over a year ago

But counting sort doesn't work so well when the things you're comparing are keys, rather than items. For example, imagine sorting a list of n transactions that belong to k accounts. Each transaction is different, but you're sorting by account number. I don't see how your counting sort example can work.

Cris Luengo Over a year ago

@JimMischel: Why can't you count account numbers? You count the number of transactions for each account number, then you determine starting indices for each account number in the output array, then you copy over the transactions to their sorted location. Counting sort is not only for integers, see the Wikipedia article linked.

Jim Mischel Over a year ago

Ahhhh ... I see. Re-reading your description, I see that you keep a "next index" position in the hash table, so that when you do a sequential scan of the list you can put each element in its place. Sorry, didn't read closely enough.

rici Over a year ago

"traversing all values in order" seems to be doing some unaccounted work :). To get a guaranteed runtime, instead of a hash table, use a sorted array. To build the array, scan as with the hash table. Lookup in this array is worst case O(log k) (when all k distinct values have been found); there are n lookups, making a total of O(n log k) = O(n log(log n)). Newly discovered elements are added with insertion sort, there are k of them, so another O(k²) = O((log n)²) (which disappears asymptotically). Final scan to do counting sort is another O(n log(log n)).

Cris Luengo Over a year ago

@rici: A sorted array is also a good solution. The lookup is more expensive, but the worst case with hash tables is not pretty either. But do note that the “unaccounted work” is sorting k elements, so that cost also disappears asymptotically. I just focused on the stuff you need to do n times.

|

Mtkel N · Accepted Answer · 2020-07-07 13:11:00Z

2

on the same note as here:

You create a binary tree, any tree: each node would be a list, of elements, with a distinct key.

now we iterate the array, and for each key we search for it in the tree, the search would take log(log(n)), as there is a maximum of log(n) distinct nodes in the tree. ( if the key doesnt exist we just add it as a node to the tree ).

so iterating the array would take O(n*log(log(n))) as we gave n elements.

finally, as this is a binary search tree, we can call in order, and would get the sorted order of the arrays.

and all that left is to combine them into single array. that would take O(n) time.

so we get O(n + nlog(log(n))=O(nlog(log(n)))

edited Jul 7, 2020 at 13:11

answered Jul 7, 2020 at 13:03

Mtkel N

791 silver badge6 bronze badges

Collectives™ on Stack Overflow

Sorting an array in O(n log(log n))

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related