Java Finding duplicate in a list

Question

I have a list of objects (e.g. List and I have to find duplicate in the list. I don't have the source code of UnknownSrcClass and UnknownSrcClass doesn't have hash code and equals implemented. So I can't put it in a Set to find the duplicates

I have below two solution

Build a HashMap<String,List<UnknownSrcClass>> where key will be build using the fields responsible for equality check.

Iterate the HashMap if for a key list size > 1 then iterate the list of items and find duplicates

Put items in a TressSet with Comprator and check for add method's return value.

Please suggest me which one would more efficient performance wise. Per me #3 is better approach.

The answer to any "which is the most efficient"-type questions is: what do you mean by "efficient"? If you mean which has the smallest asymptotic complexity, well, just look up the asymptotic complexity for the two data structures; if you mean which takes less time for your application, you need to implement both and time them. — Andy Turner
– Andy Turner, Commented Sep 22, 2016 at 21:09
Different approach - how about an UnknownSrcClassWrapper which implements hashCode() and equals() on the relevant fields of UnknownSrcClass? — Andrew S
– Andrew S, Commented Sep 22, 2016 at 21:24
So if you don't know what UnknownSrcClass is, how are you supposed to tell whether two instances are duplicates? — Joe C
– Joe C, Commented Sep 22, 2016 at 21:50

fabian · Accepted Answer · 2016-09-22 22:50:34Z

1

The first approach may not be be more efficient than the second approach. Let's assume the equality check is based on 2 strings that you concatenate. There are different possibilities to get to the same string.

In the worst case you get a HashMap with a single key but n different elements in the value. Pairwise comparing them leads to running time of O(n²). This is worse than the O(n * log(n)) running time you achieve using a TreeSet.

If you use something like the first approach, create something does not lead to different values being mapped to the same key, e.g. combine the values using Arrays.asList:

HashSet<List<Object>> set = new HashSet<>();
for (Iterator<UnknownSrcClass> iterator = list.iterator(); iterator.hasNext();) {
    UnknownSrcClass element = iterator.next();
    List<Object> lst = (element == null ? null : Arrays.asList(element.getProperty1(), element.getProperty2(), ...));
    if (!set.add(lst)) {
        // handle duplicate, e.g.
        iterator.remove();
    }
}

answered Sep 22, 2016 at 22:50

fabian

82.9k12 gold badges99 silver badges124 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Debopam Over a year ago

Thanks for the answer. What I did was build String (String concatenation) using all properties that was used in equality check. Use that key in HashMap. That way I believe it will distribute the data to multiple keys. Let me know your thought.

fabian Over a year ago

@Debopam A concatenation the same string can be created by different properties, e.g. "abdefg" = "" + "abcdefg" = "a"+"bcdefg" = "ab"+"cdefg" = "abc"+ "defg" = "abcde"+"fg" = "abcdef" +"g" = "abcdefg"+"" For the worst case you always get the same string from your properties, but the strings are different and therefore the O(n²) for comparing all of them. The list returned by Arrays.asList implements hashCode/equals based on the hashcodes/equality of the elements preventing this kind of behavior...

Debopam Over a year ago

Thanks. Is this approach (code snippet you have given) more efficient than TreeSet with comparator?

fabian Over a year ago

@Debopam: In the expected case, yes. However if you are unlucky with the hash codes, all the elements could be mapped to the same bucket. In that case the performance would still be worse than the performance of a TreeSet.

Hassan Mirzaee · Accepted Answer · 2016-09-22 21:55:31Z

I think the #1 is Ok, cos I think the cost of #1 would be O(n) but #3 would be > O(n) as long as compare would be called for each entry through the entire list. this is my #1 option:

public class Main {

static class Model {
public final Long id;
public final String field1;
public final boolean fieldn;

public Model(Long id, String field1, boolean fieldn) {
    super();
    this.id = id;
    this.field1 = field1;
    this.fieldn = fieldn;
}

}

public static void main(String[] args) throws InterruptedException {

List<Model> list = Arrays.asList(new Model(1L, "sample 1", true), new Model(1L, "sample 1", true));
Map<String, List<Model>> doublications = new HashMap<>();
list.forEach(m -> checkDoublication(doublications, m));
doublications.forEach(Main::print);
// and this would print => key: "1sample 1true", doublications: 1

}

private static void print(String key, List<Model> list) {
System.out.println(String.format("key: \"%s\", doublications: %d", key, list.size()));
}

private static String key(Model model) {
return model.id + model.field1 + model.fieldn;
}

private static void checkDoublication(Map<String, List<Model>> map, Model model) {
String key = key(model);
if (!map.containsKey(key))
    map.put(key, new LinkedList<>());
else
    map.get(key)
       .add(model);

}

Collectives™ on Stack Overflow

Java Finding duplicate in a list

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related