std::unique with predicate comparing std::string not removing duplicate

Question

Unless I am missing something or missunderstand the mechanism (very likely) Shouldn't the "1" duplicate not exist in this vector ?

chunks.erase( std::unique ( chunks.begin(), chunks.end(), 
                           []( std::string &s1, std::string &s2 ){ 
                              return ( s1.compare(s2) == 0 ? true : false );}), 
                           chunks.end() );

Before Executing the above:

1       l:1
1+      l:2
1+1     l:3
1+1=    l:4
+       l:1
+1      l:2
+1=     l:3
1       l:1
1=      l:2
=       l:1

After executing the above code:

1       l:1
1+      l:2
1+1     l:3
1+1=    l:4
+       l:1
+1      l:2
+1=     l:3
1       l:1
1=      l:2
=       l:1

I have tried without a predicate (assuming std::strings that are identical would be removed). For some reason the "ones" are identified as identical? I have looked at their length (assuming a whitespace was stuck as a prefix or postfix) but they have the same length.

Am I missing something ?

You really shouldn't need to use this predicate as it is used by default. Can you please add more code as what are the exact strings and how you print the output? — Ivaylo Strandjev
– Ivaylo Strandjev, Commented Mar 14, 2013 at 15:54
Minor nitpick regarding return ( s1.compare(s2) == 0 ? true : false ); : this is redundant, just use return ( s1.compare(s2) == 0); — András Aszódi
– András Aszódi, Commented Mar 21, 2017 at 10:11

Matthieu M. · Accepted Answer · 2013-03-14 16:17:38Z

14

You are (probably) misunderstanding something.

std::unique only removes contiguous duplicates, so if you wish to remove all duplicates a precondition to applying std::unique is to sort your range using the same predicate.

edited Mar 14, 2013 at 16:17

answered Mar 14, 2013 at 15:55

Matthieu M.

303k58 gold badges495 silver badges754 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Drew Dormann · Accepted Answer · 2013-03-14 16:21:00Z

5

std::unique assumes that non-unique elements are adjacent, as they would be if (for one example) chunks were sorted. This allows std::unique to have O(n) complexity.

If you want to maintain a specific order in your vector and remove duplicates, that's a problem with O(n²) complexity. You can use the logic provided here to do that.

// Create a new vector without the duplicates
std::vector<string> unique_chunks;
for (std::vector<string>::iterator x = chunks.begin(); x != chunks.end();) {
  if ( unique_chunks.find(*x) != unique_chunks.end() ) {
    unique_chunks.push_back( *x );
  }
}

// Make chunks hold this new vector (allowing the old vector to be destroyed)
std::swap( chunks, unique_chunks );

And no, you didn't need that predicate.

edited Mar 14, 2013 at 16:21

answered Mar 14, 2013 at 15:55

Drew Dormann

65.5k14 gold badges133 silver badges200 bronze badges

6 Comments

RiaD Over a year ago

It could be done O(n log n) too, if I understand correct(sort + unique + sort)

Benjamin Lindley Over a year ago

@RiaD: Unless your two sorts have different ordering criteria, (sort + unique + sort) is exactly equivalent to (sort + unique).

Drew Dormann Over a year ago

@RiaD Perhaps? It would be more like (sort + unique + un-sort) though. I can't think of a good un-sort approach though.

RiaD Over a year ago

@BenjaminLindley, surely they are different

Benjamin Lindley Over a year ago

@RiaD: What's different? The ordering criteria? Then what is the ordering criteria for the second sort?

|

RiaD · Accepted Answer · 2013-03-14 17:51:05Z

3

As mentioned in other answer, unique deletes contiguous blocks of duplicate, if you need remove duplicates and preserve order of rest elements(order of first occurrence, here) in O(N log N) time you may do the following:

template<typename T>
bool bySecond(const pair<T, int>& a, const pair<T, int>& b) {
    return a.second < b.second;
}

template<typename T>
bool firstEqual(const pair<T, int>& a, const pair<T, int>& b) {
    return a.first == b.first;
}

template<typename it>
it yourUnique(it begin, it end){
    typedef typename std::iterator_traits<it>::value_type value_t;
    vector<pair<value_t, int>> v;
    for(it c = begin; c != end; ++c){
        v.push_back(make_pair(*c, v.size())); // second is start index;
    }
    sort(v.begin(), v.end()); // sort by value then by index
    v.erase(unique(v.begin(), v.end(), firstEqual<value_t>), v.end());
    sort(v.begin(), v.end(), bySecond<value_t>); // restore order.
    it c = begin;

    for(const auto& x: v){
       *(c++) = x.first;
    }
    return it;
}

Possibility to have own predicate isn't implemented. It's possible but one disadvantage is that you will have to provide less-than function, not equality one, that's maybe impossible in some cases.

edited Mar 14, 2013 at 17:51

answered Mar 14, 2013 at 16:33

RiaD

47.8k12 gold badges85 silver badges128 bronze badges

3 Comments

Benjamin Lindley Over a year ago

+1 Okay, clever. But you need a predicate for unique which ignores second. Otherwise, it won't remove anything, because every pair is unique by its second value.

Drew Dormann Over a year ago

Nice. It's verbose, but for big vectors you win the big O test. Edit: I said "clever", but when multiple devs use that word, it starts to get a certain smell :)

RiaD Over a year ago

@BenjaminLindley, sure. I'll add it now

David Rodríguez - dribeas · Accepted Answer · 2013-03-14 15:55:52Z

1

The std::unique algorithm assumes that the input range is in order and removes duplicates by comparing two consecutive values. To be able to use the algorithm you will need to sort the input first.

answered Mar 14, 2013 at 15:55

David Rodríguez - dribeas

209k23 gold badges304 silver badges496 bronze badges

Collectives™ on Stack Overflow

std::unique with predicate comparing std::string not removing duplicate

4 Answers 4

Comments

6 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

6 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related