2

I have this html code, to do xpath on it:

<b>Random Field:</b>
<p>
   A random field describes an <a href="/index.php?page=glossary&term_id=230">
   experiment</a> with outcomes being functions of more than one continuous variable, 
   for example U(x,y,z), where  x, y, and z are coordinates in space. Random field is 
   extension of the concept of <a href="/index.php?page=glossary&term_id=598">random 
   process</a> into the case of multivariate argument.
</p>

I tried this to take the text inside the <p> tag:

$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page);
$xpath = new DomXPath($dom);
print $xpath->evaluate('string(//p[preceding::b]/text())');

But it just gave me this:

A random field describes an

What I want is:

A random field describes an ..(an so on until).. of multivariate argument. So I'm guessing the problem lies on the <a> tag. Cause every time I tried to do this on the same-patterned document, it stops right before this <a> tag. Thanks..

1 Answer 1

1

This would work:

$xpath->query('//p[preceding::b]')->item(0)->textContent;

There's a string-join function in XPath, but sadly not in the XPath 1.0 version in lbxml which PHP uses.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you ^^ What's the difference between evaluate and query? What does item(0) mean there? Thanks..
evaluate() will return a typed result (nodelist, string, integer,etc.) result if possible, ->query() always returns a DOMNodeList, ->item(0) gets the first (0-indexed) item from that list, in this case the sole (first) <p/> element. If there are more <p/> nodes you wish to capture, you'd loop through the DOMNodelist that ->query() returns & concatenate the ->textContent of the items in there 'manually'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.