0

I get a set of elements by parsing a html document. There is a possibility that the elements may contain duplicates. What is the best way to list only unique elements?

I come from C++ background and see a possibility of doing it using a set and custom equality operation. However, not sure how to do it in Java. Appreciate any code that would help me do it the right and efficient way.

ArrayList<Element> values = new ArrayList<Element>();

// Parse the html and get the document
Document doc = Jsoup.parse(htmlDataInStringFormat);

// Go through each selector and find all matching elements
for ( String selector: selectors ) {

    //Find elements matching this selector
    Elements elements = doc.select(selector);

    //If there are no matching elements, proceed to next selector
    if ( 0 == elements.size() ) continue;

    for (Element elem: elements ){
        values.add(elem);
    }
}

if ( elements.size() > 0 ) {
    ????? // Need to remove duplicates here
}
1
  • The solutions posted will work, but only if you override equals() and hashCode() for class Element. Commented Dec 4, 2013 at 17:45

5 Answers 5

3

java.util.HashSet will give you an Unordered set there are also other extensions of java.util.Set in the API that will give you ordered sets or concurrent behaviour if needed.

Depending upon what the class Element is you may additionally need to implement the equals and hashCode functions on it. as per comments by @musical_coder.

eg:

Set<Element> set = new HashSet<Element>(elements);

in Order to provide an overridden equals method or Element I would create thin wrapper around the Element class for myself MyElement or something more sencibly named eg

    public static class MyElement extends Element {

        private final Element element;

        public MyElement(Element element){
            this.element = element;
        }

        // OverRide equals and Hashcode
        // Delegate all other methods
    }

and pass that into the set, ok so now I'm hoping the class isn't final. Effectivly wrapp all your elements in this class. Ah ElementWrapper that is a better name.

Sign up to request clarification or add additional context in comments.

8 Comments

Thx.. Would it work if I directly add them into a set? Like: Set<Element> values = new HashSet<Element>(); rather than moving elements into set?
@Kiran probably yes, depends what else you intend to use it for you may wish to consider the alternative TreeSet as MSach suggests if the sort order is important to you, they are (inc. List) all Collections so you can iterate them in the same way.
Thanks.. I don't need a sorted order, just need to avoid duplicates. Infact, I need avoid sorting. Will stick to Hashset..
Is there a way to provide a custom comparator? I don't have access to element class to change the equals method :(
@Kiran Does the class need a different equals to that it implements already? I don't know that lib well there may be one in place already. but no the default HashSet does not allow you to replace the equals method.
|
2

Add the elements to a java.util.HashSet and it would contain only unique elements.

Comments

1

Use HashSet if you just want to avoid duplicate. Use Tree set if you want ordering alongwith avoiding duplicates

Comments

0

Additionally override the equals and hashCode method of Element

class Element {
...

public boolean equals(Object o) {
    if (! (o instanceof Element)) {
    return false;
}
Element other = (Element)o;
//compare the elements of  this and o like
if (o.a != this.a) { return false;}
...

}
...
public int hashCode() {
    //compute a value that is will return equal hash code for equal objects
}
}

Comments

0

While the answers posted work if there is a possibility to modify the element, I cannot do that. I donot need a sorted set, hence here is the solution I found..

TreeSet<Element> nt = new TreeSet<Element>(new Comparator<Element>(){
        public int compare(Element a, Element b){
            if ( a == b ) 
                return 0;
            if ( (a.val - b.val) > 0 )
                return 1;
            return -1;
        }
    });

for (Element elem: elements ){
    nt.add(elem);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.