1

ok so i have been battling with this for a while now so maybe someone can help me.

Im trying to get the email link from this HTML:

<div id="field_11" class="fieldRow span12 lastFieldRow">
  <span class="caption">E-mail</span>
  <span class="output">
   <script type="text/javascript">
    <!--
     document.write('<a hr'+'ef="mai'+'lto'+':'+
      '%40;%67;%6d;%61;%69;%6c;<\/a>');
    //-->
   </script>
   <a href="mailto:%40%67%6d%61%69%6c">@mail</a>
  </span>
</div>

Im trying to get the '@mail' part of the html code, after the a href="mailto:..." part. NOT the document.write() part but the last tag in the code.

for some reason when ever i try to get the children of the tag span with the output class it thinks it only has 1 child which is the script tag but i just can't seem to grab the email plain text.

So far what i have:

 $target_url = "some_web_site";
 $html = new simple_html_dom();
 $html->load_file($target_url);

foreach($html->find('span[class=output]') as $d){ 
    echo $d->children(1)->plaintext . "<br />";
 }

any help?

6
  • Your code should work, what's the output of it (or the error message) ? Commented Apr 29, 2014 at 17:46
  • it prints out a bunch of these errors: Notice: Trying to get property of non-object in /Applications/MAMP/htdocs/webcrawler/index.php on line 224 Commented Apr 29, 2014 at 17:47
  • Sounds like your load_file() isn't loading right. Can you try removing the 2nd and 3rd lines (both beginning with $html, and replace with $html = file_get_html($target_url);? Commented Apr 29, 2014 at 18:12
  • @LaughDonor - tried your approach, still got those errors. Commented Apr 29, 2014 at 18:44
  • Well, the main reason you're having this problem is $html->find('span[class=output]') is returning null. You need to check to make sure your selectors are correct. Maybe using span.output instead? Commented Apr 29, 2014 at 18:50

1 Answer 1

1

It is possible with just DOM+Xpath, too.

$dom = new DOMDocument();
$dom->loadHtml($html);
//$dom->loadHtmlFile($htmlFile);
$xpath = new DOMXpath($dom);

var_dump(
  $xpath->evaluate(
    'string(//span[@class="output"]//a[starts-with(@href, "mailto:")])'
  )
);

Output: https://eval.in/148063

string(5) "@mail"

The Xpath selects all span elements with the class attribute "output"

//span[@class="output"]

Then it looks for a elements where the href attribute starts with "mailto:"

//span[@class="output"]//a[starts-with(@href, "mailto:")]

The result of this is a list of a element nodes (with the example content a single node). The string() function casts the first node into a string if the node list is empty it will return an empty string.

string(//span[@class="output"]//a[starts-with(@href, "mailto:")])

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.