Following is a fragment of an html document for which I need to associate the "title" - e.g. FILE_BYTES_WRITTEN - with the text() entry in the first succeeding .
The following xpath works great in python lxml:
/td[text()='FILE_BYTES_WRITTEN']/following-sibling::td
The doc fragment:
<td>HDFS_BYTES_READ</td>
<td align="right">4,825</td>
<td align="right">0</td>
<td align="right">4,825</td>
</tr>
<tr>
<td>FILE_BYTES_WRITTEN</td>
<td align="right">415,881</td>
<td align="right">48,133</td>
<td align="right">464,014</td>
</tr>
<tr>
<td>HDFS_BYTES_WRITTEN</td>
<td align="right">98,580,205</td>
<td align="right">2,010</td>
<td align="right">98,582,215</td>
</tr>
But when I try to do this in Java I am having less success. I am not sure if there are any java html parsers that can support this. I am presently using HtmlCleaner.