1

I have some XML that roughly looks like this:

<project type="mankind">
    <suggestion>Build the enterprise</suggestion>
    <suggestion>Learn Esperanto</suggestion>
    <problem>Solve world hunger</suggestion>
    <discussion>Do Vulcans exist</discussion>
</project>

I want to use XPath to find out the names of the second level elements (there can be elements I won't know upfront) using Java. This is the code I tried:

public NodeList xpath2NodeList(Document doc, String xPathString) throws XPathExpressionException {
     XPath xpath = XPathFactory.newInstance().newXPath();
     MagicNamespaceContext nsc = new MagicNamespaceContext();
     xpath.setNamespaceContext(nsc);
     Object exprResult = xpath.evaluate(xPathString, doc, XPathConstants.NODESET);
     return (NodeList) exprResult;
}

My XPath is /project/*/name(). I get the error:

javax.xml.transform.TransformerException: Unknown nodetype: name

A query like /project/suggestion works as expected. What am I missing? I'd like to get a list with the tag names.

4
  • What version of XPath are you working with? Commented Aug 29, 2014 at 14:59
  • you should use /project/@name not /project/*/name() Commented Aug 29, 2014 at 15:03
  • @aoulhent The OP is trying to retrieve the element names of /project/*, not the content of this attribute. Commented Aug 29, 2014 at 15:05
  • @MathiasMüller Java6 (don't ask). And thx for clarifying. Yes I don't need the attribute of the top level, but the Element. I'll update the question to remove the potential confusion Commented Aug 29, 2014 at 15:07

2 Answers 2

3

Java6 (don't ask).

I think your implementation only supports XPath 1.0. If that were true, only the following would work:

"name(/project/*)"

The reason for this is that in the XPath 1.0 model, you cannot use functions (like name()) as a step in a path expression. Your code throws an exception and in this case, the processor mistakes your function name() for an unknown node test (like comment()). But there is nothing wrong with using a path expression as the argument of the name() function.

Unfortunately, if an XPath 1.0 function that can only handle a single node as an argument is given a sequence of nodes, only the first one is used. Therefore, it is likely that you will only get the first element name as a result.

XPath 1.0's capability to manipulate is very limited and often the easiest way to get around such problems is to reach for the higher-level language that uses XPath as the query language (in your case Java). Or put another way: Write an XPath expression to retrieve all relevant nodes and iterate over the result, returning the element names, in Java.

With XPath 2.0, your inital query would be fine. Also see this related question.

Sign up to request clarification or add additional context in comments.

5 Comments

Much better, now I get a org.apache.xpath.XPathException: Can not convert #STRING to a NodeList!, But I guess that's due to my XPathConstants.NODESET. When I try it with XPathConstants.STRING it works. Now.... is there a way to figure out if it is a String only or a nodeset coming back. If I omit the parameter it is always a String coming back
@stwissel I'm not familiar with the Java Transformer, but is it necessary that the method that evaluates your path expression returns a NodeList? Can you write different methods, one for finding nodes, and the other for strings?
That's the plan B. Of course I then must guess which of the XPath expressions (I read them from a config file) would return a String and which one a nodeset (or I just try and catch the error).
@stwissel As I said, as much as I would like to help with Java, that's not exactly my field of expertise. I find it cumbersome to have to determine the return type in advance and I'm sure there's a better way. How about having just one method to evaluate XPath, only writing expressions that evaluate to a node set, return the nodes and pass them to another method that takes a node set of elements as arguments and returns their names?
I won't write the XPath, so I have no control. I expect 90% to be Nodesets, so I'll just put an error handler around it and if it fails try again using text. Not sexy, but the XML isn't very big, so we talk millisecond delays. Thx for the pointer with the syntax. Appreciate the swift reply
1

Below code may answer your original question.

    package com.example.xpath;

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.IOException;

    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathConstants;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathExpressionException;
    import javax.xml.xpath.XPathFactory;

    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    import org.xml.sax.SAXException;

    public class XPathReader {

        static XPath xPath =  XPathFactory.newInstance().newXPath();

        public static void main(String[] args) {

            try {
                FileInputStream file = new FileInputStream(new File("c:/mankind.xml"));

                DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();

                DocumentBuilder builder =  builderFactory.newDocumentBuilder();

                Document xmlDocument = builder.parse(file);

                XPathExpression expr = xPath.compile("//project/*");
                NodeList list= (NodeList) expr.evaluate(xmlDocument, XPathConstants.NODESET);
                for (int i = 0; i < list.getLength(); i++) {
                    Node node = list.item(i);
                    System.out.println(node.getNodeName() + "=" + node.getTextContent());
                }

            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (SAXException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } catch (ParserConfigurationException e) {
                e.printStackTrace();
            } catch (XPathExpressionException e) {
                e.printStackTrace();
            }        
        }
    }

Assuming the input is corrected (Solve world hunger), the code should print:

suggestion=Build the enterprise
suggestion=Learn Esperanto
problem=Solve world hunger
discussion=Do Vulcans exist

5 Comments

Thx for the code. Challenge here: the XPath returns the whole element, not just the name. I have the need to be able to distinguish between getting the name or the content of an element based on the XPath. With a Nodeset returned the post processing step needs to do that, but that needs additional 'knowledge' besides the XPath expression
On the println the code shows how to just retrieve the name; i.e. node.getNodeName().
Sorry for not being clear. I appreciate your help. The XPath expressions will be pulled from an external source. When I read "/project/*" (one slash should be sufficient, since it is on the top level) I get a node set back. When I get it back I would not know if the XPath was meant to retrieve the node name or the node value. So the XPath would be ambiguous. Using Java classes to retrieve the name is nicely shown in your example, but I was looking for the XPath directly returning them. Thx for helping out here
I don't see how NodeList can only send back name or value based on the XPath expression. Either it needs to return a NodeList when the xPathString = "/project/*" or String when the xPathString = "name(/project/*[3])". One option is for the method signature to be 'public List<String> xpath2NodeList(Document doc, String xPathString) throws XPathExpressionException'. Vary the XPathConstants to NODESET or STRING, depending if the xPathString starts with 'name('. When using NODESET place all the retrieved values in a List. When using String only return the element name in List index 0.
That was exactly my problem. The best solution is to have a later xpath engine where /name() is working. name(somthing) returns a concatenated string if something is a nodeset. So some error handling required as you nicely pointed out

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.