8

Suppose I have a file with integers (<10^6). I need to make a sorted array using those integers. Consider the following cases

  1. Case 1 : Copy all the data into an array and sort (assume O(nlgn)).
  2. Case 2 : Sort while inserting each element into the array.

Which is safer and why? Which is faster and why? If the number of integers is further increased (>10^9) ,what are the implications then?

I tried both the cases and 'after sorting' yielded better results in terms of speed.I can see why, but is there a better way to do case 2( Currently checking entering element with each element in array to find it's appropriate position).

4
  • When you keep array sorted you should use binary search to find the insertion point. Also, how do you actually insert into array? Commented Dec 22, 2017 at 15:51
  • Finding the appropriate position and shifting all else up one position. Commented Dec 22, 2017 at 16:00
  • 1
    Are the numbers unique? If so, you could insert them into a SortedSet and then call toArray on that set. If there are repeats, you could use a TreeMap, but more complicated. Commented Dec 22, 2017 at 16:04
  • Another approach for elements <= 10 ^ 7 where you don't have the total n elements all at once you could use Fenwick tree. Where each update take log(n) . This method will be lot faster than case 2. Commented Dec 22, 2017 at 16:38

3 Answers 3

6

The problem with inserting the element into the already-sorted array (aka Insertion Sort) is: While finding the index where to insert the element can be done in O(log n) using binary search, when actually inserting the element, all the following elements have to be shifted, resulting in (on average) O(n/2) for inserting into an array[] or ArrayList.

When using a LinkedList, the elements do not have to be shifted, but here, you can't binary-search as well: Finding the index where to insert is about O(n/4) (according to this overview), and another O(n/4) for actually inserting the element, adding to O(n/2), same as for ArrayList. (You could create a custom Skip List providing faster lookup and simultaneous insertion, but AFAIK Java does not provide something like this.)

If the numbers are unique, you could consider inserting them into a TreeSet and then calling toArray at the end. Inserting each number will be O(log n), for a total of O(n log n), with an additional O(n) each time you get the numbers in sorted order. (According to your comment, they are not unique, but maybe this helps others.) You could still use a variant of this approach using a TreeMap, mapping elements to their count, but this will be more complex to implement.

Thus, collecting the numbers in an array[] or ArrayList or LinkedList first and then sorting at the end seems to be better -- provided, of course, that you do not need the sorted version of the array in each step.

The "sorted insert" would give you (on average) O(log n + n/2) = O(n/2) for inserting each of the n numbers, for a total of O(n²/2), while maintaining a sorted array at all times. Sorting at the end is O(1) for inserting each of the n numbers plus O(n log n) at the end (or whenever you need the sorted list in between), resulting in O(n + k n log n) = O(k n log n) for sorting k > 0 times. (And if we solve for k, we see that sorting at the end would be faster as long as k < n / 2 log n.)

Sign up to request clarification or add additional context in comments.

1 Comment

A linked list sort could be implemented using bottom up merge sort which merges nodes into a small array of pointers to lists. After all elements are received, the array is merged to form a single sorted list. However, a linked list sort is slower than array sort, so it's probably not practical to use a linked list if using an array is an option.
1

Instead of checking the new element against every element in the array (linear search) you should use binary search: check in which half of the array it belongs, then cut that half in half and so on. This will yield O(n log n), same as the sort-after approach.

2 Comments

While this is true, the insertion itself will be linear from the insertion point to the end of the array. Given that the implementation of the array is contiguous in memory, which I think most JVM arrays are?
You are right of course. I was thinking about the question in mathematical terms of asymptotic runtime, not in actual programming terms. This means Case 1 will always be faster regardless of the asymptotical runtime - repeatedly inserting into an array is simply a bad idea when you can just build the array from start to finish.
1

Case 1: inserting n elements into an array takes n time, since appending to an array is constant time. Then sorting after takes n log n time, thus n log n in total.

Case 2: inserting a element into a sorted array at the 'correct' position takes log n time (with binary search). Doing this for n elements yields a running time of n log n.

Both take the same amount of time, but case 2 will use n memory while, depending on your sorting algorithm, case 1 can require more memory.

4 Comments

Well, finding the position takes O(log n), but actually inserting the element (and shifting all the following elements"one up") might take O(n) depending on the type of list.
In case of primitive arrays this will be useless but best used for LinkedList. Right?
@tobias_k The worst case complexity of case 2 is n ^ 2 ??
@Atul Sounds about right, although in practice it's more like O(n²/2), as the list starts empty and you don't always have to shift all the elements.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.