Making a new array without duplicates in C

Question

my code works as far as i can tell.. I was wondering if it can be done in a better way (better time complexity) and what is the time complexity of my code as im not really sure how to caculate it. cant change the current array in the question but if there is a faster way to do it by removal i would also like to know, thanks a lot.

    int i = 1, j = 0, count = 1;
int arrNew[SIZE] = { NULL }; 
arrNew[0] = arr1[0]; 
while(i<size){
    if (arr1[i] == arrNew[j]) { // if the element of arr1 is already added, resets j for next iteration and moves to the next element.
        j = 0;
        i++;
    }
    else {
        if (j == count - 1) { // checks if we reached the end of arrNew and adds missing element.
            arrNew[count] = arr1[i];
            j = 0;
            count++; // this variable makes sure we check only the assigned elements of arrNew.
            i++;
        }
        else // if j < count -1 we didnt finish checking all of arrNew.
            j++;
    }
}

Um. pretty sure this will work for non-sorted arrays, but: this will be degenerate to n^2 worst case (array already populated with distinct unique values). For large arrays you're probably better off sorting the original, then single-passing to throw out duplicates. You'll lose the original ordering in doing so, however, which may be a deal breaker for that approach. Depending on the domain, a counting flag-indicator array is also on option, and would be roaring fast if the model fits. Unrelated, a VLA is a recipe for disaster for larger arrays, and probably also needs consideration. — WhozCraig
– WhozCraig, Commented Dec 8, 2021 at 18:56
but i used the count variable to make sure i check only assigned spaces in the array, is it still n^2? SIZE is a variable outside the method(used #define SIZE) because we didnt learn how to make a dynamic array yet.. — technical
– technical, Commented Dec 8, 2021 at 19:05
Worse case (already distinct unique values), the run will be O(n^2). consider what happens with each pass. every new item as j marches up the original will compare against every already-classified item in the new array. Since they're all unique already, that means sweeps will be 1, 2, 3, 4... (n-2), (n-1) comparisons. That's a worst case O(n^2) result. Fyi, the best-case, is the entire original filled with the same value. That means an immediate jettison after the first compare on each iteration, resulting in only one final value in the array and a O(n) best-case. — WhozCraig
– WhozCraig, Commented Dec 8, 2021 at 19:10
1. Your loop is confusingly written, since it increments i or j conditionally, hiding the loop complexity. 2. if permitted, the sanest approach is to do a "sort into new array" (which is O(n log n)), then remove duplicates in a second pass (alternatively, this could be done incrementally using a "partial sort", but this has worse complexity: AFAICT O((n/m)n log(m)) if done in chunks of size m), 3. if you want to preserve the original order, you could do approach #2 into a temporary buffer, then allocate a bitset and do another pass over the input, doing a binary search and marking it. — o11c
– o11c, Commented Dec 8, 2021 at 19:16

Caleb · Accepted Answer · 2021-12-08 20:31:15Z

1

I was wondering if it can be done in a better way (better time complexity)

It's a little hard to tell what's going on at first, but it looks like you're basically using one loop to do two jobs. You're looping on i to step through the original array, but also using j to scan through the new array for each new element. Effectively, you've got nested loops that both potentially have the same size, so you've got O(n²) complexity.

I'd suggest rewriting your code so that the two loops are explicit. You're not saving any time by making one loop do double duty, and if you come back to this code a month from now you're going to waste a bunch of time trying to remember how it works. Make your code obvious — it's as much about communicating with your future self or your coworkers as with the compiler.

Can you improve on that O(n²) complexity? Yes, definitely. One way is to sort the array, so that duplicate values end up being adjacent to each other in the array. It's then easy to just not copy any values that are the same as the preceding value. I know you can't modify the original array, but you can copy the whole thing, sort it, and then copy that array while removing dupes. That'd give you O(n log n) complexity (if you choose an efficient sorting algorithm). In fact, you could speed that up a bit by combining the sorting and copying -- but you'd still end up with O(n log n) complexity. Another way is to use a hash table: check to see whether the value exists in the table, tossing it if it does, or adding it to the table and copying to the new array if it doesn't. That'd be close to O(n).

edited Dec 8, 2021 at 20:31

answered Dec 8, 2021 at 19:15

Caleb

125k19 gold badges189 silver badges282 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

o11c Over a year ago

Remember that a combined sort_and_copy implementation is usually more efficient than a separate copy and sort_in_place - especially if stability is desired. Regretfully, the C standard library only offers in-place non-stable sorting. but fortunately, most people know to use third-party libraries for everything.

technical Over a year ago

the sorting is nlogn, isnt it? afterwards ill still need to go through the array to remove the duplicates.. wont it take more time unless i have very few duplicates in the array?

o11c Over a year ago

Most general-purpose in-place stable sorting algorithms are O(n log² n) or worse, actually (supposedly "block merge sort" is an exception, but I've never heard of it before looking it up just now - and even it suggests that extra memory would be preferred, since otherwise it has fewer "best case"s). If you drop either the "stable" or "in-place" requirement, or can place harsh restrictions on your input, many more algorithms are known.

o11c Over a year ago

Removing duplicates in-place is just O(n), which disappears next to the complexity of sorting. Simply iterate once over the array with two indexes: increment "source" unconditionally but "dest" only if you actually found a new value. Then do a realloc to shrink it at the end.

Caleb Over a year ago

@technical The idea I described is: copy the array, sort that in place, and then copy that while removing duplicates. So you're copying the array twice (both O(n)) and sorting it once (O(n log n)), so the overall complexity is O(n log n). When we talk about big-O complexity, we're really interested in how the performance changes with the size of the problem.

|

Collectives™ on Stack Overflow

Making a new array without duplicates in C

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related