Getting XML Node text value with Java DOM

Question

I can't fetch text value with Node.getNodeValue(), Node.getFirstChild().getNodeValue() or with Node.getTextContent().

My XML is like

<add job="351">
    <tag>foobar</tag>
    <tag>foobar2</tag>
</add>

And I'm trying to get tag value (non-text element fetching works fine). My Java code sounds like

Document doc = db.parse(new File(args[0]));
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();   
Node an,an2;

for (int i=0; i < nl.getLength(); i++) {
    an = nl.item(i);

    if(an.getNodeType()==Node.ELEMENT_NODE) {
        NodeList nl2 = an.getChildNodes();

        for(int i2=0; i2<nl2.getLength(); i2++) {
            an2 = nl2.item(i2);

            // DEBUG PRINTS
            System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");

            if(an2.hasChildNodes())
                System.out.println(an2.getFirstChild().getTextContent());

            if(an2.hasChildNodes())
                System.out.println(an2.getFirstChild().getNodeValue());

            System.out.println(an2.getTextContent());
            System.out.println(an2.getNodeValue());
        }
    }
}

It prints out

tag type (1): 
tag1
tag1
tag1
null
#text type (3):
_blank line_
_blank line_
...

Thanks for the help.

It would help if you clearly indicated what the variable 'n' is currently holding exactly, the Document or the documentElement ? — AnthonyWJones
– AnthonyWJones, Commented Apr 21, 2009 at 15:06

Jon · Accepted Answer · 2016-08-25 20:41:12Z

57

I'd print out the result of an2.getNodeName() as well for debugging purposes. My guess is that your tree crawling code isn't crawling to the nodes that you think it is. That suspicion is enhanced by the lack of checking for node names in your code.

Other than that, the javadoc for Node defines "getNodeValue()" to return null for Nodes of type Element. Therefore, you really should be using getTextContent(). I'm not sure why that wouldn't give you the text that you want.

Perhaps iterate the children of your tag node and see what types are there?

Tried this code and it works for me:

String xml = "<add job=\"351\">\n" +
             "    <tag>foobar</tag>\n" +
             "    <tag>foobar2</tag>\n" +
             "</add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = db.parse(bis);
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;

for (int i=0; i < nl.getLength(); i++) {
    an = nl.item(i);
    if(an.getNodeType()==Node.ELEMENT_NODE) {
        NodeList nl2 = an.getChildNodes();

        for(int i2=0; i2<nl2.getLength(); i2++) {
            an2 = nl2.item(i2);
            // DEBUG PRINTS
            System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
            if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getTextContent());
            if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getNodeValue());
            System.out.println(an2.getTextContent());
            System.out.println(an2.getNodeValue());
        }
    }
}

Output was:

#text: type (3): foobar foobar
#text: type (3): foobar2 foobar2

edited Aug 25, 2016 at 20:41

Jon

9,9639 gold badges63 silver badges78 bronze badges

answered Apr 21, 2009 at 15:04

jsight

28.5k25 gold badges111 silver badges140 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Emilio Over a year ago

now i'm printing also .getNodeName().. and it returns the right value (tag)

Emilio Over a year ago

My tag element doesn't have childs :/ If I try simply with an2.getFirstChild().getTextContent() or similar it throw a NullPointerException

jsight Over a year ago

Try just using getChildElements instead of getFirstChild(). Perhaps getFirstChild() is skipping over Element typed nodes for some reason?

user204503 Over a year ago

i have come across a similar scenario where i have a child node and need to extract its content. I have had debug statements to make sure i reached the correct node but unlike what jsight gave in his solution nothing really comes as output. I have literally copy pasted his code and event then it doesn't work for the example.

Nathan G Over a year ago

Is there a way to get the outer markup of the Node, e.g. <tag>foobar</tag> ?

toolkit · Accepted Answer · 2009-04-21 15:22:34Z

22

If your XML goes quite deep, you might want to consider using XPath, which comes with your JRE, so you can access the contents far more easily using:

String text = xp.evaluate("//add[@job='351']/tag[position()=1]/text()", 
    document.getDocumentElement());

Full example:

import static org.junit.Assert.assertEquals;
import java.io.StringReader;    
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;    
import org.junit.Before;
import org.junit.Test;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;

public class XPathTest {

    private Document document;

    @Before
    public void setup() throws Exception {
        String xml = "<add job=\"351\"><tag>foobar</tag><tag>foobar2</tag></add>";
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        document = db.parse(new InputSource(new StringReader(xml)));
    }

    @Test
    public void testXPath() throws Exception {
        XPathFactory xpf = XPathFactory.newInstance();
        XPath xp = xpf.newXPath();
        String text = xp.evaluate("//add[@job='351']/tag[position()=1]/text()",
                document.getDocumentElement());
        assertEquals("foobar", text);
    }
}

answered Apr 21, 2009 at 15:22

toolkit

50.4k18 gold badges113 silver badges140 bronze badges

2 Comments

Emilio Over a year ago

Unfortunatly is an educational job and i must use DOM apis :/

ben.snape Over a year ago

Thanks, this complete example (with imports) really helped me out after struggling with other similar solutions.

Zeus · Accepted Answer · 2015-02-18 17:16:43Z

6

I use a very old java. Jdk 1.4.08 and I had the same issue. The Node class for me did not had the getTextContent() method. I had to use Node.getFirstChild().getNodeValue() instead of Node.getNodeValue() to get the value of the node. This fixed for me.

edited Feb 18, 2015 at 17:16

answered Feb 18, 2015 at 17:08

Zeus

6,6347 gold badges60 silver badges94 bronze badges

Comments

vtd-xml-author · Accepted Answer · 2016-04-24 01:50:26Z

If you are open to vtd-xml, which excels at both performance and memory efficiency, below is the code to do what you are looking for...in both XPath and manual navigation... the overall code is much concise and easier to understand ...

import com.ximpleware.*;
public class queryText {
    public static void main(String[] s) throws VTDException{
        VTDGen vg = new VTDGen();
        if (!vg.parseFile("input.xml", true))
            return;
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        // first manually navigate
        if(vn.toElement(VTDNav.FC,"tag")){
            int i= vn.getText();
            if (i!=-1){
                System.out.println("text ===>"+vn.toString(i));
            }
            if (vn.toElement(VTDNav.NS,"tag")){
                i=vn.getText();
                System.out.println("text ===>"+vn.toString(i));
            }
        }

        // second version use XPath
        ap.selectXPath("/add/tag/text()");
        int i=0;
        while((i=ap.evalXPath())!= -1){
            System.out.println("text node ====>"+vn.toString(i));
        }
    }
}

Collectives™ on Stack Overflow

Getting XML Node text value with Java DOM

4 Answers 4

5 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related