2

I am trying to get the text from a specific node's parent. For example:

<td colspan="1" rowspan="1">
  <span>
    <a class="info" shape="rect" 
             rel="empLinkData" href="/employee.htm?id=8468524">
        Jack Johnson
    </a>
  </span>
   (*)&nbsp;
</td>

I am able to successfully process the anchor tag by using:

$xNodes = $xpath->query('//a[@class="info"][@rel="empLinkData"]');

// $xNodes contains employee ids and names
foreach ($xNodes as $xNode)
{
    $sLinktext = @$xNode->firstChild->data;
    $sLinkurl = 'http://www.company.com' . $xNode->getAttribute('href');

    if ($sLinktext != '' && $sLinkurl != '')
    {
        echo '<li><a href="' . $sLinkurl . '">' .
                $sLinktext . '</a></li>';
    }
}

Now, I need to retrieve the text from the <td> tag (in this case, the (*)&nbsp; appearing right after the span tag closes), but I can't seem to refer to it properly.

The xpath for this that seems to make the most sense to me is:

$xNodes = $xpath->query('//a[@class="info"]
          [@rel="empLinkData"]/ancestor::*');

but it is retrieving the wrong data from elsewhere nested above this code.

2
  • Thanks for the quick response! Assuming this query is correct, how would I actually display the data (see the foreach example above)? $xNode->firstChild->data is not working.. Commented Jul 8, 2012 at 23:01
  • Kimono is a real cool tool for uncovering xpath: kimonolabs.com Commented May 2, 2014 at 23:14

3 Answers 3

2

It's not necessary to retreat back up the tree. Instead, directly select the td that contains the relevant element:

//td[descendant::a[@class="info"][@rel="empLinkData"]]/text()

Edit: As @Dimitre rightly pointed out, this selects all text children. Your td has two such nodes: the whitespace-only text node that precedes the span and the text node that follows it. If you only want the second text node, then use:

//td[descendant::a[@class="info"][@rel="empLinkData"]]/text()[2]

Or:

//td[descendant::a[@class="info"][@rel="empLinkData"]]/text()[last()]

As you can see, the resulting expressions are essentially the same, but you do need to target the correct text node (if you want only one). Note also that if the target text is truly in a td then it's safer to target that element type directly (without wildcards). As this is HTML, your actual document almost certainly contains several other elements, including multiple other anchors that you may not want to target.

Sample PHP:

$nodes = $xpath->query(
    '//td[descendant::a[@class="info"][@rel="empLinkData"]]/text()[last()]');
echo "[". $nodes->item(0)->nodeValue . "]";
Sign up to request clarification or add additional context in comments.

Comments

0

Deepest td ancestor:

//a[@class="info"][@rel="empLinkData"]/ancestor::td[1]

Comments

0

Use:

//*[a[@class="info"][@rel="empLinkData"]]/following-sibling::text()[1]

This selects a single text node -- exactly the wanted one.

Do note that an XPath expression like:

//td[descendant::a[@class="info"][@rel="empLinkData"]]/text() 

selects more than one text nodes -- not only the wanted text node.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.