Problems traversing xml file with Java

Question

I'm trying to traverse a simple XML document with Java, but for some reason whitespace is being counted as nodes. For example, I have this:

        factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        DOMImplementation domImpl = builder.getDOMImplementation(); 
        factory.setIgnoringComments(true);
        factory.setIgnoringElementContentWhitespace(true);
        DOMImplementationLS ls = (DOMImplementationLS) domImpl.getFeature("LS", "3.0");
        LSInput in = ls.createLSInput();
        in.setByteStream(is);
        LSParser parser = ls.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");

        document = parser.parse(in);
        document.getDocumentElement().getFirstChild()

So for the following XML, the first child returned is some combination of whitespace.

<?xml version="1.0"?>
<opendap>
<root url="http://localhost/" filter=".*" />
<rewrite>
    <var name="altitude" type="enum" resAttr="getNodeName" profattr="profattr"/>

</rewrite>
<constants>
    <catalogURL>http://google.com</catalogURL>
</constants>
<resAttr>
    <Publishers>person1</Publishers>
    <Publishers>person2</Publishers>
</resAttr>

</opendap>

How do I fix this?

Edit: I've kind of fixed it by doing this (resattr is Element representing ). Unfortunately, the setValidating didn't work.

    for (Node child = this.resAttr.getFirstChild(); child != null; child = child.getNextSibling()){

        if (child.getFirstChild() != null && child.getFirstChild().getNodeValue() != null){
            String nodename = child.getNodeName();
            String nodevalue = child.getFirstChild().getNodeValue();

What version of Java and JAX-P are you using? There is a bug with that method bugs.sun.com/bugdatabase/view_bug.do?bug_id=6545684 — Jeff Storey
– Jeff Storey, Commented Nov 2, 2010 at 22:15
// Compiled from DocumentBuilder.java (version 1.5 : 49.0, super bit) public abstract class javax.xml.parsers.DocumentBuilder. Also using java 1.6 — victor
– victor, Commented Nov 2, 2010 at 22:47

Community · Accepted Answer · 2017-05-23 12:11:31Z

2

Sackers is on the right track - the parser needs to be in validating mode. The document probably also needs a grammar (the parser documentation also mentions sections 2.10 and 3.2.1 of the XML spec).

For example, configured with setValidating(true) and setIgnoringElementContentWhitespace(true), the parser will strip the whitespace between the x and y elements, but not within the y element since this is PCDATA:

<?xml version="1.0"?>

<!DOCTYPE x [
  <!ELEMENT x (y+)>
  <!ELEMENT y (#PCDATA)>
]>

<x>
  <y>  </y>
</x>

edited May 23, 2017 at 12:11

CommunityBot

11 silver badge

answered Nov 3, 2010 at 0:05

McDowell

109k31 gold badges207 silver badges272 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sackers · Accepted Answer · 2010-11-02 23:36:16Z

1

Looking at the docs for 'setIgnoringElementContentWhitespace' - 'Due to reliance on the content model this setting requires the parser to be in validating mode.'. Have you tried:

factory.setValidating(true);

answered Nov 2, 2010 at 23:36

Sackers

962 bronze badges

Collectives™ on Stack Overflow

Problems traversing xml file with Java

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related