6

I am learning C and came over the topic of sorting. I wrote a comp() function in and used qsort to sort an array of int. Now for the next task I need to remove the duplicates from the array.
Is it possible to sort and remove duplicates at the same time?

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>    
int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s) {    
        return 1;
    }    
    if (f < s) {    
        return -1;
    }    
    return 0;
}

void printIndexArray() {    
    int i = 0;    
    for (i = 0; i < 10; i++) {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main() {    
    qsort(indexes, sizeof(indexes) / sizeof(int), sizeof(int), comp);    
    printIndexArray();    
    return 0;
}
1
  • 2
    You are using inbuild function for sorting, write your own function. Commented Sep 20, 2013 at 19:58

5 Answers 5

2

Since your numbers are already sorted, removing dupes is easy. In C++, it's even built in as std::unique:

http://en.cppreference.com/w/cpp/algorithm/unique

Assuming you want to do it yourself, you can do it the same way unique does it:

int* unique (int* first, int* last)
{
  if (first==last) return last;

  int* result = first;
  while (++first != last)
  {
    if (!(*result == *first)) 
      *(++result)=*first;
  }
  return ++result;
}
Sign up to request clarification or add additional context in comments.

4 Comments

how do I use the above method, just started to learn so my pointers are that good. Can u help integrate the unique function into my code above
Just call unique(indexes, indexes + (sizeof(indexes) / sizeof(int)));
BTW, I recommend making a macro or something to get the array size. Something like #define arrsize(x) (sizeof(x) / sizeof(x[0]))
If you have a macro like that, it would be unique(indexes, indexes + arrsize(indexes));
1

Yes

This can be achieved by mergesort. If both left and right are the same just merge the one value

Comments

1

That's the code that removes the duplicates using mergesort. This snippet of code does the removing work:

else if(a[p1] == a[p2])
{
    merged[p] = a[p1];
    p1++;
    p2++;
}

That's the iterative merge sort while the recursive version would be easier.

#include <stdio.h>
#include <stdlib.h>

#define min(a,b) (((a) < (b)) ? (a) : (b))

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

void merge(int *a, int s, int m, int e)
{
    int p1 = s;
    int p2 = m + 1;
    int * merged = (int*)malloc(sizeof(int) * (e - s + 1));
    int p = 0;
    while(p1 < m + 1 && p2 < e + 1)
    {
        if(a[p1] > a[p2])
        {
            merged[p] = a[p2];
            p2++;
        }
        else if(a[p1] == a[p2])
        {
            merged[p] = a[p1];
            p1++;
            p2++;
        }
        else
        {
            merged[p] = a[p1];
            p1++;
        }
        p++;
    }

    while(p1 < m + 1)
    {
        merged[p++] = a[p1++];
    }

    while(p2 < e + 1)
        merged[p++] = a[p2++];

    int i;
    for(i = 0;i < (e -s+1); i++)
    {
        a[s + i] = merged[i];
    }

    free(merged);
}

void merge_sort(int *a, int n)
{
    int width;
    for(width = 1; width < n; width = 2 * width)
    {
        int i;
        for(i = 0; i < n; i = i + 2 * width)
        {
            merge(a, i, min(i + width - 1, n - 1), min(i + 2 * width - 1, n - 1) );
        }
    }
}

void printIndexArray()
{    
    int i = 0;    
    for(i = 0; i < 10; i++)
    {    
        printf("i is %d\n", indexes[i]);    
    }
}

int main()
{
    merge_sort(indexes, sizeof(indexes) / sizeof(int) );
    printIndexArray();
    return 0;
}

Comments

1
#include <stdio.h>
#include <stdlib.h>

int indexes[10] = { 0, 98, 45, 65, 45, 98, 78, 56, 65, 45 };

size_t undup(int array[], size_t len)
{
size_t src,dst;

if (!len) return 0;
for (src=dst=1; src < len; src++) {
        if (array[dst-1] == array[src]) continue;
        array[dst++] = array[src];
        }
return dst;
}

int comp(const void * elem1, const void * elem2) {

    int f = *((int*) elem1);
    int s = *((int*) elem2);

    if (f > s)     return 1;
    if (f < s)     return -1;

    return 0;
}

void printIndexArray(size_t len) {
    size_t i = 0;
    for (i = 0; i < len; i++) {
        printf("array[%zu] is %d\n", i, indexes[i]);
    }
}

int main() {
    size_t len = 10;
    printf("Before sort\n" );
    printIndexArray(len);

    qsort(indexes, sizeof indexes / sizeof indexes[0], sizeof indexes[0], comp);
    printf("After sort\n" );
    printIndexArray(len);

    len = undup(indexes,10);
    printf("After undup\n" );
    printIndexArray(len);

    return 0;
}

Comments

0

The short answer is: yes.

The long answer is: it is always possible, but the complexity to do it depends heavily on the algorithm you use.

The more complex algorithms like quick-sort, slow-sort, bucket-sort, and straight-radix-sort do not lend themselves to such an enhancement, because they rely on the data being in a consecutive array, that can implicitly be split into subarrays. So, when you detect a duplicate, you cannot easily take it out. Again, it is possible, but certainly not a problem for beginners.

The less complex in-place algorithms like bubble-sort, insertion-sort, and shell-sort make it relatively easy: you can just replace one of the duplicates you detect with a sentinel value that sorts greater than all legal values, and let it rise to the top. After that, you just need to scoop off the cream of sentinel values and you are done.

The algorithms that really lend themselves to removing duplicates, are the ones that use intermediate arrays that grow/shrink in the process; in these cases you can just shrink or skip growing one of these intermediate arrays when you detect a duplicate. Candidates are merge-sort and heap-sort.

Note, however, that it is more prudent to just sort the array, and eliminate duplicates in a second, separate step. Why? Because eliminating duplicates adds complexity to the inner loop of the sorting algorithm, which is of O(n*log(n)) in most relevant cases. But eliminating duplicates from a sorted array is an O(n) operation, making the split operation faster than the fused one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.