0

I need help in make an xpath expression to read all node names, node values, and attributes in an xml string. I made this:

private List<String> listOne = new ArrayList<String>();
private List<String> listTwo = new ArrayList<String>();

public void read(String xml) {
    try {
        // Turn String into a Document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes()));

        // Setup XPath to retrieve all tags and values
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']", document, XPathConstants.NODESET);

        // Iterate through nodes
        for(int i = 0; i < nodeList.getLength(); i++) {
            Node node = nodeList.item(i);
            listOne.add(node.getNodeName());
            listTwo.add(node.getNodeValue());
            // Another list to hold attributes
        }

    } catch(Exception e) {
        LogHandle.info(e.getMessage());
    }
}

I found the expression //text()[normalize-space()=''] online; however, it doesn't work. When I get try to get the node name from listOne, it is just #text. I tried //, but that doesn't work either. If I had this XML:

<Data xmlns="Somenamespace.nsc">
    <Test>blah</Test>
    <Foo>bar</Foo>
    <Date id="2">12242016</Date>
    <Phone>
        <Home>5555555555</Home>
        <Mobile>5555556789</Mobile>
    </Phone>
</Data>

listOne[0] should hold Data, listOne[1] should hold Test, listTwo[1] should hold blah, etc... All the attributes will be saved in another parallel list.

What expression should xPath evaluate?

Note: The XML String can have different tags, so I can't hard code anything.

Update: Tried this loop:

NodeList nodeList = (NodeList) xPath.evaluate("//*", document, XPathConstants.NODESET);

// Iterate through nodes
for(int i = 0; i < nodeList.getLength(); i++) {
    Node node = nodeList.item(i);

    listOne.add(i, node.getNodeName());

    // If null then must be text node
    if(node.getChildNodes() == null)
        listTwo.add(i, node.getTextContent());
}

However, this only gets the root element Data, then just stops.

4
  • 1
    text() refers to element content. In your example XML, blah, bar and 12242016 are text nodes. So, text() probably is not what you want. Commented Jun 13, 2016 at 20:10
  • Thanks! If text() gives the element content, will node() give the nodes? Commented Jun 13, 2016 at 20:26
  • 1
    I think some clarification might be needed. In XML, “node” refers to every possible piece of information in an XML document, including text, comments, processing instructions, etc., whereas “element” refers to information consisting of either a start tag and a matching end tag, or a single self-closing tag (<name … />). Do you really want to read every node, or just every element and its attributes? Commented Jun 13, 2016 at 20:52
  • Thanks for clarification. I want to read every element, any text associated with it (<Name>Flow</Name>), and its attributes if there are any. Hope I got the meanings correct. Commented Jun 13, 2016 at 21:49

1 Answer 1

1

//* will select all element nodes, //@* all attribute nodes. However, an element node does not have a meaningful node value in the DOM, so you would need to read out getTextContent() instead of getNodeValue.

As you seem to consider an element with child elements to have a "null" value I think you need to check whether there are any child elements:

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
    docBuilderFactory.setNamespaceAware(true);

    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

    Document doc = docBuilder.parse("sampleInput1.xml");

    XPathFactory fact = XPathFactory.newInstance();
    XPath xpath = fact.newXPath();

    NodeList allElements = (NodeList)xpath.evaluate("//*", doc, XPathConstants.NODESET);

    ArrayList<String> elementNames = new ArrayList<>();
    ArrayList<String> elementValues = new ArrayList<>();

    for (int i = 0; i < allElements.getLength(); i++)
    {
        Node currentElement = allElements.item(i);
        elementNames.add(i, currentElement.getLocalName());
        elementValues.add(i, xpath.evaluate("*", currentElement, XPathConstants.NODE) != null ? null : currentElement.getTextContent());
    }

    for (int i = 0; i < elementNames.size(); i++)
    {
        System.out.println("Name: " + elementNames.get(i) + "; value: " + (elementValues.get(i)));
    }

For the sample input

<Data xmlns="Somenamespace.nsc">
    <Test>blah</Test>
    <Foo>bar</Foo>
    <Date id="2">12242016</Date>
    <Phone>
        <Home>5555555555</Home>
        <Mobile>5555556789</Mobile>
    </Phone>
</Data>

the output is

Name: Data; value: null
Name: Test; value: blah
Name: Foo; value: bar
Name: Date; value: 12242016
Name: Phone; value: null
Name: Home; value: 5555555555
Name: Mobile; value: 5555556789
Sign up to request clarification or add additional context in comments.

5 Comments

I did //* with getTextContext() and was able to get tag names and values. However, for parent nodes like Data, the text content it returns is everything from it's children. So listTwo.get(0) returns blah, bar, 12242016. I tried checking if getChildNodes() is not null then don't get the text content but then the loop just stops. How do I make it so listOne(0) is Data, listTwo(0) is null, listOne(1) is Test, listTwo(1) is blah. I'll update the OP.
getChildNodes gives you a NodeList, never null. And even <foo>bar</foo> has a child node, a text node. Also what do you want to do with mixed content like <p>This is <b>bold</b> text.</p>? You need to explain more carefully which results you want.
Oh, I see now. Regarding your example, I won't have a case like that. It will be strictly like the one shown in the OP (added to the XML example a bit more). I just want listOne to hold all the elements and listTwo to hold the text associated with them. However, if an element has children and no direct text, then for that index listTwo should be null as shown in the example in the above comment.
@Flow, see the edited sample, it uses XPath to check whether an element has child elements and only call getTextContent() if not.
Thank you very much for all the help! Learned a lot from this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.