remove duplicate strings in a List in Java

Question

Update: I guess HashSet.add(Object obj) does not call contains. is there a way to implement what I want(remove dup strings ignore case using Set)?

Original question: trying to remove dups from a list of String in java, however in the following code CaseInsensitiveSet.contains(Object ob) is not getting called, why?

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new CaseInsensitiveSet():new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new Vector<String>(set);
    return res;
}


public class CaseInsensitiveSet  extends LinkedHashSet<String>{

    @Override
    public boolean contains(Object obj){
        //this not getting called.
        if(obj instanceof String){

            return super.contains(((String)obj).toLowerCase());
        }
        return super.contains(obj);
    }

}

Please learn how to use code formatting rather than using <pre> statements. — Andrew Thompson
– Andrew Thompson, Commented Dec 26, 2012 at 12:09
Do you want to retain the order of the strings in the list? If so, for duplicates, do you want to take the position of the first or the last occurrence? — Tom Anderson
– Tom Anderson, Commented Dec 26, 2012 at 12:17

Tom Anderson · Accepted Answer · 2012-12-26 13:38:23Z

8

Try

        Set set = new TreeSet(String.CASE_INSENSITIVE_ORDER);
        set.addAll(list);
        return new ArrayList(set);

UPDATE but as Tom Anderson mentioned it does not preserve the initial order, if this is really an issue try

    Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
    Iterator<String> i = list.iterator();
    while (i.hasNext()) {
        String s = i.next();
        if (set.contains(s)) {
            i.remove();
        }
        else {
            set.add(s);
        }
    }

prints

[2, 1]

edited Dec 26, 2012 at 13:38

Tom Anderson

47.4k17 gold badges96 silver badges138 bronze badges

answered Dec 26, 2012 at 12:09

Evgeniy Dorofeev

137k31 gold badges209 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tom Anderson Over a year ago

If the asker doesn't care about preserving the order in the list, then this is an excellent answer. If he does, sadly, it is merely a good one.

Peter Lawrey · Accepted Answer · 2012-12-26 12:08:23Z

5

contains is not called as LinkedHashSet is not implemented that way.

If you want add() to call contains() you will need to override it as well.

The reason it is not implemented this way is that calling contains first would mean you are performing two lookups instead of one which would be slower.

answered Dec 26, 2012 at 12:08

Peter Lawrey

535k83 gold badges770 silver badges1.2k bronze badges

Comments

Rahul · Accepted Answer · 2012-12-27 06:17:53Z

3

add() method of LinkedHashSet do not call contains() internally else your method would have been called as well.

Instead of a LinkedHashSet, why dont you use a SortedSet with a case insensitive comparator ? With the String.CASE_INSENSITIVE_ORDER comparator

Your code is reduced to

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new TreeSet<String>(String.CASE_INSENSITIVE_ORDER):new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new ArrayList<String>(set);
    return res;
}

If you wish to preserve the Order, as @tom anderson specified in his comment, you can use an auxiliary LinkedHashSet for the order.

You can try adding that element to TreeSet, if it returns true also add it to LinkedHashSet else not.

public static List<String> removeDupList(List<String>list){
        Set<String> sortedSet = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        List<String> orderedList = new ArrayList<String>();
        for(String str : list){
             if(sortedSet.add(str)){ // add returns true, if it is not present already else false
                 orderedList.add(str);
             }
        }
        return orderedList;
    }

edited Dec 27, 2012 at 6:17

answered Dec 26, 2012 at 12:18

Rahul

16.4k4 gold badges44 silver badges64 bronze badges

3 Comments

Tom Anderson Over a year ago

As i commented on Evgeniy's answer, this doesn't preserve the order of the strings in the list, which might or might not matter.

Rahul Over a year ago

Thanks for pointing it out. I have updated the code to preserve order of the elements.

Tom Anderson Over a year ago

Rather than building up orderedSet and then converting it to an ArrayList, you could just build up the ArrayList directly. The fact that orderedSet is a set is not actually needed here.

Yogesh Patil · Accepted Answer · 2012-12-26 12:34:30Z

0

Try

    public boolean addAll(Collection<? extends String> c) {
            for(String s : c) {
            if(! this.contains(s)) {
                this.add(s);
            }
        }
        return super.addAll(c);
    }
    @Override
    public boolean contains(Object o) {
        //Do your checking here
//      return super.contains(o);
    }

This will make sure the contains method is called if you want the code to go through there.

answered Dec 26, 2012 at 12:34

Yogesh Patil

8984 silver badges14 bronze badges

Comments

Tom Anderson · Accepted Answer · 2012-12-26 13:31:22Z

0

Here's another approach, using a HashSet of the strings for deduplication, but building the result list directly:

public static List<String> removeDupList(List<String> list, boolean ignoreCase) {
    HashSet<String> seen = new HashSet<String>();
    ArrayList<String> deduplicatedList = new ArrayList<String>();
    for (String string : list) {
        if (seen.add(ignoreCase ? string.toLowerCase() : string)) {
            deduplicatedList.add(string);
        }
    }
    return deduplicatedList;
}

This is fairly simple, makes only one pass over the elements, and does only a lowercase, a hash lookup, and then a list append for each element.

answered Dec 26, 2012 at 13:31

Tom Anderson

47.4k17 gold badges96 silver badges138 bronze badges

Collectives™ on Stack Overflow

remove duplicate strings in a List in Java

5 Answers 5

1 Comment

Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related