10

Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.

However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.

Here is the code which demonstrates this:

 public static void main(String[] args)
    {
         HashSet<GraphEdge> set = new HashSet<>();
        GraphEdge edge1 = new GraphEdge(1, "a");
        GraphEdge edge2 = new GraphEdge(2, "b");
        GraphEdge edge3 = new GraphEdge(3, "c");

        set.add(edge1);
        set.add(edge2);
        set.add(edge3);

        edge2.setId(1);
        edge2.setName("a");

        for(GraphEdge edge: set)
        {
            System.out.println(edge.toString());
        }

        if(edge2.equals(edge1))
        {
            System.out.println("Equals");
        }
        else
        {
            System.out.println("Not Equals");
        }
    }

    public class GraphEdge
    {
        private int id;
        private String name;

        //Constructor ...

        //Getters & Setters...

        public int hashCode()
        {
        int hash = 7;
        hash = 47 * hash + this.id;
        hash = 47 * hash + Objects.hashCode(this.name);
        return hash;    
        }

        public boolean equals(Object o)
        {
            if(o == this)
            {
                return true;
            }

            if(o instanceof GraphEdge)
            {
                GraphEdge anotherGraphEdge = (GraphEdge) o;
                if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
                {
                    return true;
                }
            }

                return false;
        }
    }

The output from the above code:

1 a
1 a
3 c
Equals

Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?

A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.

0

7 Answers 7

18

The situation you describe is invalid. See the Javadoc: "The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set."

Sign up to request clarification or add additional context in comments.

3 Comments

Okay so the above scenario is invalid. I guess the only option is to copy the contents into a new HashSet.
@Spi1988 The correct solution is to stick to the contract of Set and not modify objects after adding them to the collection.
@PB_MLT what will you achieve by copying the contents into new HashSet?
4

To add to @EJP's answer, what will happen in practice if you mutate objects in a HashSet to make them duplicates (in the sense of the equals / hashcode contract) is that the hash table data structure will break.

  • Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g. contains and other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.

  • If you iterate the set, both instances will still be present ... in violation of the Set contract.

Of course, this is very broken from the application perspective.


You can avoid this problem by either:

  • using an immutable type for your set elements,
  • making a copy of the objects as you put them into the set and / or pull them out of the set,
  • writing your code so that it "knows" not to change the objects for the duration ...

From the perspective of correctness and robustness, the first option is clearly best.


Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?

4 Comments

Thx for the explanation. If you had a mechanism which could detect that an object in a set changed, and is now equal to another object present in the same set, then you could just remove any one of the duplicates (it doesn't matter which one you remove since they are equal).
@Spi1988 - "it doesn't matter which one you remove since they are equal". That is not true in general. Two objects for which equals() return true need not be identical. And it could be important which one you drop. And the mechanism you posit is hypothetical.
Thank you, I was struggling with this for hours now. But honestly, this whole problem only happens because the implementation was too lazy to make a proper HashSet rather than just back it up by a HashTable thus freezing hashCode indexing to creation time. As far as I'm concearned this HashSet they give us is not a HashSet, but an ImmutableHashSet and a proper HashSet implementation is still missing from the jdk, this is outrageous really - it caches!!!! wow.
@PedroBorges You're on your own there. Everybody else manages with it OK, and every other hash table implementation I've ever seen has the same behaviour, including several I wrote myself. You just don't really need to be able to modify the keys, and if you do you have to inform the table about it: otherwise it doesn't know and it breaks. The simple solution to that is delete-modify-insert.
1

You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.

If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.

Comments

1

HashSet is not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdge immutable. For example:

GraphEdge edge4 = edge2.changeName("new_name");

In the case where GraphEdge is immutable, changing a value result in returning a new instance rather changing the existing instance.

Comments

0

method that can be used to print the elements of a LinkedList of String objects, without any duplicate elements. The method takes a LinkedList object as an input, and then creates a new HashSet object. The method then iterates through the elements of the input LinkedList, and adds each element to the HashSet. Since a HashSet does not allow duplicate elements, this ensures that only unique elements are added to the HashSet.

Then, the method iterates through the HashSet and prints each element to the console, separated by a space. Unlike the printList method, this method does not add any newlines before or after the list of elements. It simply prints the string "Non-duplicates are: " followed by the elements of the HashSet.

   public static void printSetList(LinkedList<String> list) {
    Set<String> hashSet = new HashSet<>();
    for (String v : list) {
        hashSet.add(v);
    }
    System.out.print("Non-duplicates are: ");
    for (String v : hashSet) {
        System.out.print(v + " ");
    }
}

1 Comment

Why not use hashSet.addAll(list)?
-1

Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.

Try replacing your implementation of hashCode with the following:

public int hashCode()
{
    return Objects.hashCode(this.id, this.name);
}

3 Comments

Objects.hashCode(this.id, this.name) is not valid since the hashCode method only takes one object.
I assumed you were using the Google Collections library:
-1

You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...

public class TestIterator {
    public static void main(String[] args) {
        List<String> list = new ArrayList<String>();

        list.add("1");
        list.add("1");
        list.add("2");
        list.add("3");

        for (String s : new UniqueIterator<String>(list)) {
            System.out.println(s);
        }
    }
}

public class UniqueIterator<T> implements Iterable<T> {
    private Set<T> hashSet = new HashSet<T>();

    public UniqueIterator(Iterable<T> iterable) {
        for (T t : iterable) {
            hashSet.add(t);
        }
    }

    public Iterator<T> iterator() {
        return hashSet.iterator();
    }
}

5 Comments

He doesn't have a list. He has a set. He is misusing it. Not an answer.
He is using a set as a list. So he needs to use the set properly OR use a list.
He doesn't want a list. He wants a set. He has a set. He is misusing it, and then wondering why its elements aren't unique. The solution is not to make it worse, but to stop it happening in the first place.
I don't want a set (as implement in Java). Since it is not suitable for my needs. I want a collection which is sensitive to the changes performed on its elements so as not to allow any duplicates ever.
short of some crazy hibernate style runtime hooking of set methods, there is no way a collection that can instantly react to changes on its contents. Do what EJP and others suggest. Use immutable objects and use the set properly (adding and removing as required). You could wrap this up into a class of your own to make it more convenient to the caller.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.