3

Update: I guess HashSet.add(Object obj) does not call contains. is there a way to implement what I want(remove dup strings ignore case using Set)?

Original question: trying to remove dups from a list of String in java, however in the following code CaseInsensitiveSet.contains(Object ob) is not getting called, why?

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new CaseInsensitiveSet():new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new Vector<String>(set);
    return res;
}


public class CaseInsensitiveSet  extends LinkedHashSet<String>{

    @Override
    public boolean contains(Object obj){
        //this not getting called.
        if(obj instanceof String){

            return super.contains(((String)obj).toLowerCase());
        }
        return super.contains(obj);
    }

}
3
  • 1
    Please learn how to use code formatting rather than using <pre> statements. Commented Dec 26, 2012 at 12:09
  • Why did you wipe out my edits? Commented Dec 26, 2012 at 12:10
  • Do you want to retain the order of the strings in the list? If so, for duplicates, do you want to take the position of the first or the last occurrence? Commented Dec 26, 2012 at 12:17

5 Answers 5

8

Try

        Set set = new TreeSet(String.CASE_INSENSITIVE_ORDER);
        set.addAll(list);
        return new ArrayList(set);

UPDATE but as Tom Anderson mentioned it does not preserve the initial order, if this is really an issue try

    Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
    Iterator<String> i = list.iterator();
    while (i.hasNext()) {
        String s = i.next();
        if (set.contains(s)) {
            i.remove();
        }
        else {
            set.add(s);
        }
    }

prints

[2, 1]
Sign up to request clarification or add additional context in comments.

1 Comment

If the asker doesn't care about preserving the order in the list, then this is an excellent answer. If he does, sadly, it is merely a good one.
5

contains is not called as LinkedHashSet is not implemented that way.

If you want add() to call contains() you will need to override it as well.

The reason it is not implemented this way is that calling contains first would mean you are performing two lookups instead of one which would be slower.

Comments

3

add() method of LinkedHashSet do not call contains() internally else your method would have been called as well.

Instead of a LinkedHashSet, why dont you use a SortedSet with a case insensitive comparator ? With the String.CASE_INSENSITIVE_ORDER comparator

Your code is reduced to

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new TreeSet<String>(String.CASE_INSENSITIVE_ORDER):new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new ArrayList<String>(set);
    return res;
}

If you wish to preserve the Order, as @tom anderson specified in his comment, you can use an auxiliary LinkedHashSet for the order.

You can try adding that element to TreeSet, if it returns true also add it to LinkedHashSet else not.

public static List<String> removeDupList(List<String>list){
        Set<String> sortedSet = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        List<String> orderedList = new ArrayList<String>();
        for(String str : list){
             if(sortedSet.add(str)){ // add returns true, if it is not present already else false
                 orderedList.add(str);
             }
        }
        return orderedList;
    }

3 Comments

As i commented on Evgeniy's answer, this doesn't preserve the order of the strings in the list, which might or might not matter.
Thanks for pointing it out. I have updated the code to preserve order of the elements.
Rather than building up orderedSet and then converting it to an ArrayList, you could just build up the ArrayList directly. The fact that orderedSet is a set is not actually needed here.
0

Try

    public boolean addAll(Collection<? extends String> c) {
            for(String s : c) {
            if(! this.contains(s)) {
                this.add(s);
            }
        }
        return super.addAll(c);
    }
    @Override
    public boolean contains(Object o) {
        //Do your checking here
//      return super.contains(o);
    }

This will make sure the contains method is called if you want the code to go through there.

Comments

0

Here's another approach, using a HashSet of the strings for deduplication, but building the result list directly:

public static List<String> removeDupList(List<String> list, boolean ignoreCase) {
    HashSet<String> seen = new HashSet<String>();
    ArrayList<String> deduplicatedList = new ArrayList<String>();
    for (String string : list) {
        if (seen.add(ignoreCase ? string.toLowerCase() : string)) {
            deduplicatedList.add(string);
        }
    }
    return deduplicatedList;
}

This is fairly simple, makes only one pass over the elements, and does only a lowercase, a hash lookup, and then a list append for each element.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.