1

Given a List of MyClass objects (and a custom Comparitor myComparitor if needed), what good options are there for checking if the List contains two "equal" objects?

Edit: if there are duplicates, return a reference to one or more of the duplicates.

Overriding MyClass.equals(MyClass) in this case is not an option.

My initial thought is to create a hash table of sorts, but I suspect that there's a non-hack way to accomplish the same thing:

SortedSet mySet = new TreeSet(myComparitor);
mySet.addAll(myList);
// Find duplicates in a sorted set in O(N) time

P.S. Is there a good reference on Markdown?

5
  • possible duplicate of Java: Detect duplicates in ArrayList? Commented Aug 25, 2010 at 23:54
  • Do you need to know which items are duplicates or do you just need to know if there are duplicates? Commented Aug 25, 2010 at 23:55
  • What do you mean with "equal objects"? If equals() method inherited from Object is not enough overriding is your only option. Commented Aug 25, 2010 at 23:56
  • 2
    P.S. Is there a good reference on Markdown? check the right hand column while editing message. Commented Aug 25, 2010 at 23:58
  • Sorry I didn't explictly say it, but Object.equals() isn't good enough in this case. [Edited above post: needs to return reference(s) to one or more duplicates, if any.] Commented Aug 27, 2010 at 13:32

2 Answers 2

3

If the element's equals(Object) method does not give you the semantic that you require, then HashMap or HashSet are not options. Your choices are:

  • Use a TreeMap for de-duping. This is O(NlogN).
  • Sort the ArrayList or a copy, then iterate over looking for element i equals element i + 1. This is O(NlogN).
  • Find an alternative implementation of hash sets that allows you to provide a separate object to implement equality and hashing. (Neither Apache or Google collections support this, so you'll need to look further afield.)
  • Create a wrapper class for your element type that overrides equals(Object) and hashCode(), and de-dup using a HashSet of wrapped objects. This is O(N), but the constant of proportionality will be larger than a simple HashSet due to creation of wrapper objects.

When de-duping with a Set it is probably better to use a loop rather than addAll. This is necessary if you need to know what all of the duplicates are. If you don't need to know that, then using a loop allows you to stop when you find the first duplicate. The only case where addAll is likely to perform better is when there are likely to be no duplicates.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that's a good point - I can create a copy of the List and simply sort it. I'll probably go with this approach. (And another good point - I can create a TreeMap keyed by a manually generated hash value.) Thanks for the performance tip about Set.addAll(). I'm re-writing code which performs in O(N^2) and I think O(NlogN) should be acceptable (if the constant of proportionality is low).
0

if you already have a sorted list, you could just look at any element and the next element, and if they're the same you have dups.

in your question you are using a TreeSet, which would have culled out duplicates already, so if you just need to know if you have duplicates, just check the size of mySet vs the size of myList. if they aren't the same you have dups.

1 Comment

Thanks, I've edited the post above to clarify the question. (You're right that looking for duplicates in a sorted list is simple. The TreeSet would automatically de-dupify if I create a wrapper class overriding Object.equals(), but there's overhead involved in doing that.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.