I was given XML and schema files. My goal was to output all data from the XML (without duplicates) and order this list by the date of birth. Currently I got all data printed out (with duplicates) and I don't know what to do next. I've tried different things, but unsuccessfully.
3 Answers
HashSet will depend on the Node.equals() method to determine equality, and you're adding distinct nodes, albeit with the same underlying text. From the doc:
adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2))
I would extract the underlying text (String) from the Node, and a HashSet<String> will determine uniqueness correctly.
Comments
EDIT
After reading the post again I realised I need to remove dups too so:
You can use a TreeSet to impose unqiueness and sort by DOB - I presume that a person with the same first name, surname and date of birth is the same person.
First I would wrap your Node in a class that implements Comparable and that also does the getting of all those properties you have. The wrapper needs to implement Comparable as the TreeSet uses this method to decide whether elements are different (a.compareTo(b) != 0) and also how to order them.
public static final class NodeWrapper implements Comparable<NodeWrapper> {
private static final SimpleDateFormat DOB_FORMAT = new SimpleDateFormat("yyyy-MM-dd");
private final Element element;
private final Date dob;
private final String firstName;
private final String surName;
private final String sex;
public NodeWrapper(final Node node) {
this.element = (Element) node;
try {
this.dob = DOB_FORMAT.parse(initDateOfBirth());
} catch (ParseException ex) {
throw new RuntimeException("Failed to parse dob", ex);
}
this.firstName = initFirstName();
this.surName = initSurnameName();
this.sex = initSex();
}
private String initFirstName() {
return getNodeValue("firstname");
}
private String initSurnameName() {
return getNodeValue("surname");
}
private String initDateOfBirth() {
return getNodeValue("dateofbirth");
}
private String initSex() {
return getNodeValue("sex");
}
private String getNodeValue(final String name) {
return element.getElementsByTagName(name).item(0).getTextContent();
}
public Node getNode() {
return element;
}
Date getDob() {
return dob;
}
public String getFirstName() {
return firstName;
}
public String getSurName() {
return surName;
}
public String getDateOfBirth() {
return DOB_FORMAT.format(dob);
}
public String getSex() {
return sex;
}
public int compareTo(NodeWrapper o) {
int c;
c = getDob().compareTo(o.getDob());
if (c != 0) {
return c;
}
c = getSurName().compareTo(o.getSurName());
if (c != 0) {
return c;
}
return getFirstName().compareTo(o.getFirstName());
}
@Override
public int hashCode() {
int hash = 5;
hash = 47 * hash + (this.dob != null ? this.dob.hashCode() : 0);
hash = 47 * hash + (this.firstName != null ? this.firstName.hashCode() : 0);
hash = 47 * hash + (this.surName != null ? this.surName.hashCode() : 0);
return hash;
}
@Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final NodeWrapper other = (NodeWrapper) obj;
if (this.dob != other.dob && (this.dob == null || !this.dob.equals(other.dob))) {
return false;
}
if ((this.firstName == null) ? (other.firstName != null) : !this.firstName.equals(other.firstName)) {
return false;
}
if ((this.surName == null) ? (other.surName != null) : !this.surName.equals(other.surName)) {
return false;
}
return true;
}
@Override
public String toString() {
return "FirstName: " + getFirstName() + ". Surname: " + getSurName() + ". DOB: " + getDateOfBirth() + ". Sex: " + getSex() + ".";
}
}
So if the date of birth, surname and firstname are all equal we assume it is the same person - we return 0. It is good practice, if using compareTo in this way to make it consistent with equals so that if a.compareTo(b)==0 then a.equals(b), I have added the required equals and hashCode methods as well.
Now you can use a TreeSet in your code which will automatically sort and guarantee unqiueness:
final Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("file.xml"));
final Set<NodeWrapper> inimesteList = new TreeSet<NodeWrapper>();
final NodeList isa = doc.getElementsByTagName("isa");
for (int i = 0; i < isa.getLength(); i++) {
inimesteList.add(new NodeWrapper(isa.item(i)));
}
final NodeList ema = doc.getElementsByTagName("ema");
for (int i = 0; i < ema.getLength(); i++) {
inimesteList.add(new NodeWrapper(ema.item(i)));
}
final NodeList isik = doc.getElementsByTagName("isik");
for (int i = 0; i < isik.getLength(); i++) {
inimesteList.add(new NodeWrapper(isik.item(i)));
}
System.out.println();
System.out.println("Total: " + inimesteList.size());
for (final NodeWrapper nw : inimesteList) {
System.out.println(nw);
}
I have also added a toString method and used that to print the nodes - this makes the code much cleaner.
The Document approach, while seeming simpler than JAXB, is riddled with this sort of tedium. As you already have a schema I would strongly recommend that you make the move to xjc and JAXB unmarshalling - this will make this sort of stuff hundereds of times easier.
8 Comments
Node and Element - do you have the right imports? They should be, as in your snippet, from org.w3c.dom. Of course you cannot do Element node = (Element) inimesteList.get(i). First of all if should not be a List but a Set so there is no get method. Secondly NodeWrapper is not a Node!! You need to get the wrapped Node using getNode.The constructor NodeWrapper(ode)..., is this a typo? What is "ode"?NodeWrapper class, in another class file? In which case check its imports.static modifier from the class definition in the NodeWrapper.java? I suspect the auto imports for new class are a bit wrong - they need to be the same as harjutus.java; i.e. all the imports need to be com.w3c.dom. Just edit your post and post up the new files.Element node = (Element) inimesteList.get(i)? You cannot get by id on a set. You need to use an enhanced for loop.Its better to create a Java Bean (POJO) with the single node details. Override equals() and hashcode() in the same. Store all the Node data into the List of Bean. Then use LinkedHashSet to remove duplicates. Implement Comparable or use Comparator and Collections.sort() to sort the same.
Extend or encapsulate Node in another class and override equals() and hashcode() in the same. Store all the Nodes into the List of new class instance. Then use LinkedHashSet to remove duplicates. Implement Comparable or use Comparator and Collections.sort() to sort the same.