8

I am using PHP simple DOM parser but it does not seem to have the functionality to search for text. I need to search for a string and find the parent id for it. Essentially the reverse of normal usage.

Anyone know how?

4 Answers 4

9
$html = file_get_html('http://www.google.com/');

$eles = $html->find('*');
foreach($eles as $e) {
    if(strpos($e->innertext, 'theString') !== false) {
        echo $e->id;
    }
}

http://simplehtmldom.sourceforge.net/manual.htm

Sign up to request clarification or add additional context in comments.

2 Comments

$e->id is the Simple DOM way to get the ID attribute. Perhaps try changing $eles = $html->find('*'); to $eles = $html->find('p, div'); or something.
is it not getAttribute('id') ... I can't get it to work regardless :S
6

Just imagine that any tag has a "plaintext" attribute and use standart attribute selectors.

So, HTML:

<div id="div1">
  <span>London is the capital</span> of Great Britain
</div>
<div id="div2">
  <span>Washington is the capital</span> of the USA
</div>

can be imagined in mind as:

<div id="div1" plaintext="London is the capital  of Great Britain">
  <span plaintext="London is the capital ">London is the capital</span> of Great Britain
</div>
<div id="div2" plaintext="Washington is the capital  of the USA">
  <span plaintext="Washington is the capital ">Washington is the capital</span> of the USA
</div>

And PHP to resolve your task is just:

<?php
  $t = '
    <div id="div1">
      <span>London is the capital</span> of Great Britain
    </div>
    <div id="div2">
      <span>Washington is the capital</span> of the USA
    </div>';
  $html = str_get_html($t);
  $foo = $html->find('span[plaintext^=London]');
  echo "ID: " . $foo[0]->parent()->id; // div1
?>

(take in mind that "plaintext" for <span> tags is right-padded with a space symbol; this is default behaviour of Simple HTML DOM, defined by constant DEFAULT_SPAN_TEXT)

1 Comment

so far the best answer
3
$d = new DOMDocument();
$d->loadXML($xml);
$x = new DOMXPath($d);
$result = $x->evaluate("//text()[contains(.,'617.99')]/ancestor::*/@id");
$unique = null;
for($i = $result->length -1;$i >= 0 && $item = $result->item($i);$i--){
    if($x->query("//*[@id='".addslashes($item->value)."']")->length == 1){
        echo 'Unique ID is '.$item->value."\n";
            $unique = $item->value;
        break;
    }
}
if(is_null($unique)) echo 'no unique ID found';

9 Comments

This is PHP's DOMDocument, not the SimpleHTMLDom Library as the OP stated he was using.
Ack, missed that. Still can't get my head around people using that slow, slow thingamajig, but you're right, this isn't the answer the OP is looking for then.
Sure there is, before loading, set $d->recover = true;$d->strictErrorChecking = false;, and of course, use loadHTML() instead of loadXML() for HTML. If you still get to much errors, which you cannot ignore (never display errors on production sites), you could set libxml_use_internal_errors(true); to handle them seperately from other PHP errors.
Ack, wrapper is not what we want :). My bad, my XPath is a bit rusty, try //text()[contains(.,'617.99')]/parent::*/@id, seems to work here.
Warnings can be disabled by either prepeding @ (@$d->loadHTML($html);, which is kinda evil, or using libxml_use_internal_errors(true);$d->loadHTML($html);libxml_clear_errors(); (preferred IMHO). An id should be unique, but we all know it's sometimes not. You can check with $x->query("//*[@id='theid']")->length == 1 (for priceIncTaxSpan3047 it is, but look at the 50 Table_01's, no wonder DOMDocument protests :)
|
3

Got the answer. The entire example is a little long but it works. I also show the output.

The HTML for what we are going to look at:

<html>
<head>
<title>Simple HTML DOM - Find Text</title>
</head>
<body>
<h3>Simple HTML DOM - Find Text</h3>
<div id="first">
 <p>This is a paragraph inside of div 'first'.
   This paragraph does not have the text we are looking for.</p>
 <p>As a matter of fact this div does not have the text we are looking for</p>
</div>
<div id="second">
 <ul>
  <li>This is an unordered list.
  <li id="love1">We are looking for the following word love.
  <li>Does not contain the word.
 </ul>
 <p id="love2">This paragraph which is in div second contains the word love.</p>
</div>
<div id="third">
 <a id="love3" href="goes.nowhere.com">link to love site</a>
</div>
</body>
</html>

The PHP:

<?php
include_once('simple_html_dom.php');

function scraping_for_text($iUrl,$iText)
{
echo "iUrl=".$iUrl."<br />";
echo "iText=".$iText."<br />";

    // create HTML DOM
    $html = file_get_html($iUrl);

    // get text elements
    $aObj = $html->find('text');
    if (count($aObj) > 0)
    {
       echo "<h4>Found ".$iText."</h4>";
    }
    else
    {
       echo "<h4>No ".$iText." found"."</h4>";
    }
    foreach ($aObj as $key=>$oLove)
    {
      $plaintext = $oLove->plaintext;
      if (strpos($plaintext,$iText) !== FALSE)
      {
         echo $key.": text=".$plaintext."<br />"
              ."--- parent tag=".$oLove->parent()->tag."<br />"
              ."--- parent id=".$oLove->parent()->id."<br />";
      }
    }

    // clean up memory
    $html->clear();
    unset($html);

    return;
}

// -------------------------------------------------------------
// test it!

// user_agent header...
ini_set('user_agent', 'My-Application/2.5');

scraping_for_text("test_text.htm","love");
?>

The output:

iUrl=test_text.htm
iText=love
Found love
18: text=We are looking for the following word love.
--- parent tag=li
--- parent id=love1
21: text=This paragraph which is in div second contains the word love.
--- parent tag=p
--- parent id=love2
25: text=link to love site
--- parent tag=a
--- parent id=love3

That's all they wrote!!!!

1 Comment

Great example. Would you know how to go from text, back to an element? I want to search by text and then find the nearest element. It's from an old table layout without any classes or IDs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.