Java: Test for duplicate Objects in a Collection

Question

Given a List of MyClass objects (and a custom Comparitor myComparitor if needed), what good options are there for checking if the List contains two "equal" objects?

Edit: if there are duplicates, return a reference to one or more of the duplicates.

Overriding MyClass.equals(MyClass) in this case is not an option.

My initial thought is to create a hash table of sorts, but I suspect that there's a non-hack way to accomplish the same thing:

SortedSet mySet = new TreeSet(myComparitor);
mySet.addAll(myList);
// Find duplicates in a sorted set in O(N) time

P.S. Is there a good reference on Markdown?

Do you need to know which items are duplicates or do you just need to know if there are duplicates? — mnuzzo
– mnuzzo, Commented Aug 25, 2010 at 23:55
What do you mean with "equal objects"? If equals() method inherited from Object is not enough overriding is your only option. — Kerem Baydoğan
– Kerem Baydoğan, Commented Aug 25, 2010 at 23:56
P.S. Is there a good reference on Markdown? check the right hand column while editing message. — BalusC
– BalusC, Commented Aug 25, 2010 at 23:58
Sorry I didn't explictly say it, but Object.equals() isn't good enough in this case. [Edited above post: needs to return reference(s) to one or more duplicates, if any.] — Daniel
– Daniel, Commented Aug 27, 2010 at 13:32

Stephen C · Accepted Answer · 2010-08-27 13:40:10Z

3

If the element's equals(Object) method does not give you the semantic that you require, then HashMap or HashSet are not options. Your choices are:

Use a TreeMap for de-duping. This is O(NlogN).
Sort the ArrayList or a copy, then iterate over looking for element i equals element i + 1. This is O(NlogN).
Find an alternative implementation of hash sets that allows you to provide a separate object to implement equality and hashing. (Neither Apache or Google collections support this, so you'll need to look further afield.)
Create a wrapper class for your element type that overrides equals(Object) and hashCode(), and de-dup using a HashSet of wrapped objects. This is O(N), but the constant of proportionality will be larger than a simple HashSet due to creation of wrapper objects.

When de-duping with a Set it is probably better to use a loop rather than addAll. This is necessary if you need to know what all of the duplicates are. If you don't need to know that, then using a loop allows you to stop when you find the first duplicate. The only case where addAll is likely to perform better is when there are likely to be no duplicates.

edited Aug 27, 2010 at 13:40

answered Aug 26, 2010 at 0:15

Stephen C

723k95 gold badges849 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniel Over a year ago

Thanks, that's a good point - I can create a copy of the List and simply sort it. I'll probably go with this approach. (And another good point - I can create a TreeMap keyed by a manually generated hash value.) Thanks for the performance tip about Set.addAll(). I'm re-writing code which performs in O(N^2) and I think O(NlogN) should be acceptable (if the constant of proportionality is low).

John Gardner · Accepted Answer · 2010-08-26 00:20:07Z

0

if you already have a sorted list, you could just look at any element and the next element, and if they're the same you have dups.

in your question you are using a TreeSet, which would have culled out duplicates already, so if you just need to know if you have duplicates, just check the size of mySet vs the size of myList. if they aren't the same you have dups.

answered Aug 26, 2010 at 0:20

John Gardner

25.3k5 gold badges65 silver badges78 bronze badges

1 Comment

Daniel Over a year ago

Thanks, I've edited the post above to clarify the question. (You're right that looking for duplicates in a sorted list is simple. The TreeSet would automatically de-dupify if I create a wrapper class overriding Object.equals(), but there's overhead involved in doing that.)

Collectives™ on Stack Overflow

Java: Test for duplicate Objects in a Collection

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related