Java, XPath Expression to read all node names, node values, and attributes

Question

I need help in make an xpath expression to read all node names, node values, and attributes in an xml string. I made this:

private List<String> listOne = new ArrayList<String>();
private List<String> listTwo = new ArrayList<String>();

public void read(String xml) {
    try {
        // Turn String into a Document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes()));

        // Setup XPath to retrieve all tags and values
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']", document, XPathConstants.NODESET);

        // Iterate through nodes
        for(int i = 0; i < nodeList.getLength(); i++) {
            Node node = nodeList.item(i);
            listOne.add(node.getNodeName());
            listTwo.add(node.getNodeValue());
            // Another list to hold attributes
        }

    } catch(Exception e) {
        LogHandle.info(e.getMessage());
    }
}

I found the expression //text()[normalize-space()=''] online; however, it doesn't work. When I get try to get the node name from listOne, it is just #text. I tried //, but that doesn't work either. If I had this XML:

<Data xmlns="Somenamespace.nsc">
    <Test>blah</Test>
    <Foo>bar</Foo>
    <Date id="2">12242016</Date>
    <Phone>
        <Home>5555555555</Home>
        <Mobile>5555556789</Mobile>
    </Phone>
</Data>

listOne[0] should hold Data, listOne[1] should hold Test, listTwo[1] should hold blah, etc... All the attributes will be saved in another parallel list.

What expression should xPath evaluate?

Note: The XML String can have different tags, so I can't hard code anything.

Update: Tried this loop:

NodeList nodeList = (NodeList) xPath.evaluate("//*", document, XPathConstants.NODESET);

// Iterate through nodes
for(int i = 0; i < nodeList.getLength(); i++) {
    Node node = nodeList.item(i);

    listOne.add(i, node.getNodeName());

    // If null then must be text node
    if(node.getChildNodes() == null)
        listTwo.add(i, node.getTextContent());
}

However, this only gets the root element Data, then just stops.

text() refers to element content. In your example XML, blah, bar and 12242016 are text nodes. So, text() probably is not what you want. — VGR
– VGR, Commented Jun 13, 2016 at 20:10
Thanks! If text() gives the element content, will node() give the nodes? — syy
– syy, Commented Jun 13, 2016 at 20:26
I think some clarification might be needed. In XML, “node” refers to every possible piece of information in an XML document, including text, comments, processing instructions, etc., whereas “element” refers to information consisting of either a start tag and a matching end tag, or a single self-closing tag (<name … />). Do you really want to read every node, or just every element and its attributes? — VGR
– VGR, Commented Jun 13, 2016 at 20:52
Thanks for clarification. I want to read every element, any text associated with it (<Name>Flow</Name>), and its attributes if there are any. Hope I got the meanings correct. — syy
– syy, Commented Jun 13, 2016 at 21:49

Martin Honnen · Accepted Answer · 2016-06-14 10:26:57Z

1

//* will select all element nodes, //@* all attribute nodes. However, an element node does not have a meaningful node value in the DOM, so you would need to read out getTextContent() instead of getNodeValue.

As you seem to consider an element with child elements to have a "null" value I think you need to check whether there are any child elements:

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
    docBuilderFactory.setNamespaceAware(true);

    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

    Document doc = docBuilder.parse("sampleInput1.xml");

    XPathFactory fact = XPathFactory.newInstance();
    XPath xpath = fact.newXPath();

    NodeList allElements = (NodeList)xpath.evaluate("//*", doc, XPathConstants.NODESET);

    ArrayList<String> elementNames = new ArrayList<>();
    ArrayList<String> elementValues = new ArrayList<>();

    for (int i = 0; i < allElements.getLength(); i++)
    {
        Node currentElement = allElements.item(i);
        elementNames.add(i, currentElement.getLocalName());
        elementValues.add(i, xpath.evaluate("*", currentElement, XPathConstants.NODE) != null ? null : currentElement.getTextContent());
    }

    for (int i = 0; i < elementNames.size(); i++)
    {
        System.out.println("Name: " + elementNames.get(i) + "; value: " + (elementValues.get(i)));
    }

For the sample input

<Data xmlns="Somenamespace.nsc">
    <Test>blah</Test>
    <Foo>bar</Foo>
    <Date id="2">12242016</Date>
    <Phone>
        <Home>5555555555</Home>
        <Mobile>5555556789</Mobile>
    </Phone>
</Data>

the output is

Name: Data; value: null
Name: Test; value: blah
Name: Foo; value: bar
Name: Date; value: 12242016
Name: Phone; value: null
Name: Home; value: 5555555555
Name: Mobile; value: 5555556789

edited Jun 14, 2016 at 10:26

answered Jun 13, 2016 at 20:02

Martin Honnen

169k6 gold badges100 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

syy Over a year ago

I did //* with getTextContext() and was able to get tag names and values. However, for parent nodes like Data, the text content it returns is everything from it's children. So listTwo.get(0) returns blah, bar, 12242016. I tried checking if getChildNodes() is not null then don't get the text content but then the loop just stops. How do I make it so listOne(0) is Data, listTwo(0) is null, listOne(1) is Test, listTwo(1) is blah. I'll update the OP.

Martin Honnen Over a year ago

getChildNodes gives you a NodeList, never null. And even <foo>bar</foo> has a child node, a text node. Also what do you want to do with mixed content like <p>This is <b>bold</b> text.</p>? You need to explain more carefully which results you want.

syy Over a year ago

Oh, I see now. Regarding your example, I won't have a case like that. It will be strictly like the one shown in the OP (added to the XML example a bit more). I just want listOne to hold all the elements and listTwo to hold the text associated with them. However, if an element has children and no direct text, then for that index listTwo should be null as shown in the example in the above comment.

Martin Honnen Over a year ago

@Flow, see the edited sample, it uses XPath to check whether an element has child elements and only call getTextContent() if not.

syy Over a year ago

Thank you very much for all the help! Learned a lot from this.

Collectives™ on Stack Overflow

Java, XPath Expression to read all node names, node values, and attributes

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related