45

I am looking for a really fast way to check for duplicates in a list of objects.

I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...

Suppose I have an object...

public class dupeCheckee
{
     public string checkThis { get; set; }
     public string checkThat { get; set; }

     dupeCheckee(string val, string val2)
     {
         checkThis = val;
         checkThat = val2;
     }
}

And I have a list of those objects

List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe... 
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe

I need to find the dupes in that list. When I find it, I need to do some additional logic not necessarily removing them.

When I use linq some how my GroupBy is throwing an exception...

'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)

Which is telling me that I am missing a library. I am having a hard time figuring out which one though.

Once I figure that out though, How would I essentially check for those two conditions... IE checkThis and checkThat both occur more than once?

UPDATE: What I came up with

This is the linq query that I came up with after doing quick research...

test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()

I am not certain if this is definitely better than this answer...

var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any());

I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...

The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....

Dupes:

List<DupeCheckee> test = new List<DupeCheckee>{ 
     new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}

};

No dupes...

     List<DupeCheckee> test2 = new List<DupeCheckee>{ 
     new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}

};
3
  • 1
    Add using System.Linq; to the top of your cs file to make GroupBy work. Commented Apr 24, 2013 at 16:29
  • yep. Just figured out I was missing it. Thanks. Commented Apr 24, 2013 at 16:31
  • 2
    Erm No dupes has a dupe test3,test3 Commented Apr 25, 2013 at 17:35

8 Answers 8

75

You need to reference System.Linq (e.g. using System.Linq)

then you can do

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any());

This will give you groups with all the duplicates

The test for duplicates would then be

var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any()).Any();

or even call ToList() or ToArray() to force the calculation of the result and then you can both check for dupes and examine them.

eg..

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any()).ToArray();
if (dupes.Any()) {
  foreach (var dupeList in dupes) {
    Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",
                      dupList.Key.checkThis, 
                      dupList.Key.checkThat,
                      dupList.Count() - 1));
  }

}

Alternatively

var dupes = dupList.Select((x, i) => new { index = i, value = x})
                   .GroupBy(x => new {x.value.checkThis, x.value.checkThat})
                   .Where(x => x.Skip(1).Any());

Which give you the groups which each item per group stores the original index in a property index and the item in the property value

Sign up to request clarification or add additional context in comments.

6 Comments

I am really looking to see if the item has any dupes at all. It would be nice to have several 'List<dupeCheckee>' with all the duplicates in them... That will be nice if the user wants to remove them later, But I am really just looking to check if the list has dupes at all.
@DmainEvent Thats what this does? If you want to check if there are any dupes just check dupes.Any() if true there are duplicates
Could you take a look at my solution and see if you detect anything deficient about my solution. I tried both yours and mine, mine seems fine... Not certain about yours.
@DemainEvent Well in your original post you specified the requirement of extracting the duplicates, which your solution doesn't do.
@RuudLenders Yes you can, however I was trying to show the code as a progression, just adding any() on the end of the previous result
|
21

There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:

var hasDuplicatedEntries = ListWithPossibleDuplicates
                                   .GroupBy(YourGroupingExpression)
                                   .Any(e => e.Count() > 1);
if(hasDuplicatedEntries)
{
   // Do what ever you want in case when list contains duplicates 
}

2 Comments

Only use Count if you need the actual number of elements. It walks the whole enumeration.
For more optimized code replace e => e.Count() > 1 with e => e.Skip(1).Any()
5

I like using this for knowing when there are any duplicates at all. Lets say you had a string and wanted to know if there was any duplicate letters. This is what I use.

string text = "this is some text";

var hasDupes = text.GroupBy(x => x).Any(grp => grp.Count() > 1);

If you wanted to know how many duplicates there are no matter what the duplicates are, use this.

var totalDupeItems = text.GroupBy(x => x).Count(grp =>  grp.Count() > 1);

So for example, "this is some text" has this...

total of letter t: 3

total of letter i: 2

total of letter s: 3

total of letter e: 2

So variable totalDupeItems would equal 4. There are 4 different kinds of duplicates.

If you wanted to get the total amount of dupe items no matter what the dupes are, then use this.

var totalDupes = letters.GroupBy(x => x).Where(grp => grp.Count() > 1).Sum(grp => grp.Count());

So the variable totalDupes would be 10. This is the total duplicate items of each dupe type added together.

1 Comment

Only use Count if you need the actual number of elements. It walks the whole enumeration.
1

I think this is what you're looking for:

List<dupeChecke> duplicates = dupeList.GroupBy(x => x)
                                   .SelectMany(g => g.Skip(1));

2 Comments

That only works if an equals check on dupeCheckee identifies to instances as being equal where checkThis and checkThat are equal.
@BobVale: Didn't notice he wanted to break it down even further! Your comment upvoted.
1

For in memory objects I always use the Distinct LINQ method adding a comparer to the solution.

public class dupeCheckee
{
     public string checkThis { get; set; }
     public string checkThat { get; set; }

     dupeCheckee(string val, string val2)
     {
         checkThis = val;
         checkThat = val2;
     }

     public class Comparer : IEqualityComparer<dupeCheckee>
     {
         public bool Equals(dupeCheckee x, dupeCheckee y)
         {
             if (x == null || y == null)
                 return false;

             return x.CheckThis == y.CheckThis && x.CheckThat == y.CheckThat;
         }

         public int GetHashCode(dupeCheckee obj)
         {
             if (obj == null)
                 return 0;

             return (obj.CheckThis == null ? 0 : obj.CheckThis.GetHashCode()) ^
                 (obj.CheckThat == null ? 0 : obj.CheckThat.GetHashCode());
         }
     }
}

Now we can call

List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe... 
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe

var distinct = dupList.Distinct(dupeCheckee.Comparer);

1 Comment

That is getting a distinct list, but I am looking to figure out if my list has dupes in it.
0

Do a select distinct with linq, e.g. How can I do SELECT UNIQUE with LINQ?

And then compare counts of the distinct results with the non-distinct results. That will give you a boolean saying if the list has doubles.

Also, you could try using a Dictionary, which will guarantee the key is unique.

2 Comments

If he wants to do something with the dupes GroupBy is the better approach.
@Daniel Post it as an answer so I can upvote it and the user can mark it as an answer!
0

If any duplicate occurs throws exception. Dictionary checks keys by itself. this is the easiest way.

try
{
  dupList.ToDictionary(a=>new {a.checkThis,a.checkThat});
}
catch{
 //message: list items is not uniqe
}

Comments

0

I introduced extension for specific types:

public static class CollectionExtensions
{
    public static bool HasDuplicatesByKey<TSource, TKey>(this IEnumerable<TSource> source
                                                       , Func<TSource, TKey> keySelector)
    {
        return source.GroupBy(keySelector).Any(group => group.Skip(1).Any());
    }
}

, usage example in code:

if (items.HasDuplicatesByKey(item => item.Id))
{
    throw new InvalidOperationException($@"Set {nameof(items)} has duplicates.");
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.