1

I'm trying to traverse a simple XML document with Java, but for some reason whitespace is being counted as nodes. For example, I have this:

        factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        DOMImplementation domImpl = builder.getDOMImplementation(); 
        factory.setIgnoringComments(true);
        factory.setIgnoringElementContentWhitespace(true);
        DOMImplementationLS ls = (DOMImplementationLS) domImpl.getFeature("LS", "3.0");
        LSInput in = ls.createLSInput();
        in.setByteStream(is);
        LSParser parser = ls.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");

        document = parser.parse(in);
        document.getDocumentElement().getFirstChild()

So for the following XML, the first child returned is some combination of whitespace.

<?xml version="1.0"?>
<opendap>
<root url="http://localhost/" filter=".*" />
<rewrite>
    <var name="altitude" type="enum" resAttr="getNodeName" profattr="profattr"/>

</rewrite>
<constants>
    <catalogURL>http://google.com</catalogURL>
</constants>
<resAttr>
    <Publishers>person1</Publishers>
    <Publishers>person2</Publishers>
</resAttr>

</opendap>

How do I fix this?

Edit: I've kind of fixed it by doing this (resattr is Element representing ). Unfortunately, the setValidating didn't work.

    for (Node child = this.resAttr.getFirstChild(); child != null; child = child.getNextSibling()){

        if (child.getFirstChild() != null && child.getFirstChild().getNodeValue() != null){
            String nodename = child.getNodeName();
            String nodevalue = child.getFirstChild().getNodeValue();
2
  • What version of Java and JAX-P are you using? There is a bug with that method bugs.sun.com/bugdatabase/view_bug.do?bug_id=6545684 Commented Nov 2, 2010 at 22:15
  • // Compiled from DocumentBuilder.java (version 1.5 : 49.0, super bit) public abstract class javax.xml.parsers.DocumentBuilder. Also using java 1.6 Commented Nov 2, 2010 at 22:47

2 Answers 2

2

Sackers is on the right track - the parser needs to be in validating mode. The document probably also needs a grammar (the parser documentation also mentions sections 2.10 and 3.2.1 of the XML spec).

For example, configured with setValidating(true) and setIgnoringElementContentWhitespace(true), the parser will strip the whitespace between the x and y elements, but not within the y element since this is PCDATA:

<?xml version="1.0"?>

<!DOCTYPE x [
  <!ELEMENT x (y+)>
  <!ELEMENT y (#PCDATA)>
]>

<x>
  <y>  </y>
</x>
Sign up to request clarification or add additional context in comments.

Comments

1

Looking at the docs for 'setIgnoringElementContentWhitespace' - 'Due to reliance on the content model this setting requires the parser to be in validating mode.'. Have you tried:

factory.setValidating(true);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.