Java and xpath - xHtml parsing problem

Question

I'm trying to parse a well formed xhtml document.
I'm having problems during the nodes iteration.
My xHtml has a structure like

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>...</head>
  <body>
   ...
    <form>
    ...
      <div class="AB">    (1 or 2 times)
      ...                       
        <div class="CD">  
        ...
          <table>          
             <tbody>
                <tr>    (1 to N times)
                   <td> XXX </td>
                       <td> YYY </td> ...

The information I need is contained in the columns (td).
I want to construct N objects. So every row (tr) contains in its columns the info I need to construct an object.
I've 1 or 2 div of class="AB". So basically I'll have 1 or 2 objects AB containing a list of other objects created from every row in the table

So at first I extract a NodeList of these AB divs

NodeList ABlist= (NodeList) xpath.evaluate("//div[@class='AB']", document, XPathConstants.NODESET)

Now I'm trying to get a NodeList of all the tr elems of the first div AB.

NodeList trList = (NodeList) xpath.evaluate("/div/table//tr", ABlist.item(0), XPathConstants.NODESET);

In this case the trList is empty. Do you know what's wrong with my code?
Thank you

Pavel Minaev · Accepted Answer · 2009-07-29 22:17:50Z

2

The problem in your second failing XPath is that you start it with a /:

/div/table//tr

In XPath, just as in file paths, starting a path with a / means "start from the root of the document". But you don't actually want to do that there - you want to start from your node. So:

div/table//tr

will do what you want.

answered Jul 29, 2009 at 22:17

Pavel Minaev

102k27 gold badges222 silver badges293 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mickthompson Over a year ago

You're right Pavel! I thought that (as 2nd parameter) I was passing the 'context' to the evaluate() method. I think I tried without / before posting here but maybe I changed also something else in the meantime and that didn't work at the time. Anyway it's working now. Thanks a lot for your help!

Pavel Minaev Over a year ago

You are passing the context there. The problem is that by using leading / in the query you're telling it to start the path not from the context node, but from the root of the document to which the node belongs.

skaffman · Accepted Answer · 2009-07-29 22:02:10Z

0

Are you sure this is XHTML? There's no namespace declared in your sample document, and without that namespace, it's not XHTML. If there is a namespace, and you missed that out of your sample for brevity, then your XPath expressions need to reference the namespace also, otherwise they won't select anything.

answered Jul 29, 2009 at 22:02

skaffman

405k96 gold badges825 silver badges775 bronze badges

1 Comment

mickthompson Over a year ago

Hi skaffman, I'm correctly retreiving the ABlist of divs. It's only the way I try to extract the trList that is not working. Actually you're right, the document doesn't specify any namespace so maybe it can be only called xml. It only conforms the xml spec without specifing any namespace.

Collectives™ on Stack Overflow

Java and xpath - xHtml parsing problem

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related