1

a.php:

<ul id="ul1">
    <li id="pt1">Point 1
         <ul id="ul2">
             <li id="pt11">Point 1.1</li>
             <li id="pt12">Point 1.2</li>
                <pre class="CodeDisplay">
                some codes
                </pre>
             <li id="ref">Reference: <a href="link.html" target="_blank">link</a></li>
         </ul>
    </li> 
</ul>

I would like to get the nodeValue "Point 1" only. In JS, it is:

alert(document.getElementsByTagName("li")[0].childNodes[0].nodeValue);

But I would like to get the nodeValue in PHP (Simple HTML Dom); Here's the code snippet in another PHP page (b.php):

<?php

include('simple_html_dom.php');
$html = file_get_html('http://lifelearning.net63.net/a.php');

// stuck here:
echo $html->getElementsByTagName('ul',0)->getElementsByTagName('li',0)->nodeValue;
//

?>

I have used textContent but it just extracts the content descendents under Point 1. This is not what I want. I only want "Point 1". Any help is appreciated!

3 Answers 3

1

Try this:

<?php
include('simple_html_dom.php');
$html = file_get_html('http://lifelearning.net63.net/a.php');
echo $html->find('li[id=pt1] li', 0)->innertext;

Above snippet finds the first (descent to li#pt1)matching li tag and gives your the inner text (content between the text, including all HTML in it, if any).

Have a look at SimpleHTMLDom docs. There are many ways and examples that your can find content (ID, classes, etc) from the HTML output. SimpleHTMLDom mostly follows jQuery/CSS selectors.

Note that if you do not use innertext method, it returns a SimpleHTMLDom node that you need to process before displaying.

If there were no matching elements, it will return an E_WARNING error message. So make sure your input contain the require elements or make sure the element is present with an isset()

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your reply. But it actually returns "Point 1.1" instead of "Point 1".
1

With the help of others online, a simpler solution is suggested:

$html = new DOMDocument();
$html->loadHTMLFile('http://lifelearning.net63.net/a.php');
echo $html->getElementsByTagName('li')->item(0)->childNodes->item(0)->textContent; // returns "Point 1"

What I've learnt is that

first, any external library is not required in my case, DOMDocument does the job of getting the HTML DOM of a webpage.

Second, use item() and childNodes. Very much like what it is in JS:

document.getElementsByTagName("li")[0].childNodes[0].nodeValue

But thank you for all your replies.

1 Comment

Frankly, you should accept your own answer because that regex solution is not recommended.
0

u may looking for this

 <?php  $str2 =     ' <ul id="ul1"> ' ;?>
 <?php  $str2 .=    '<li id="pt1"><div>Point 1</div> ' ;?>
 <?php  $str2 .=    ' <ul id="ul2"> ' ; ?>
 <?php  $str2 .=    '     <li id="pt11">Point 1.1</li>' ; ?>
 <?php  $str2 .=    '    <li id="pt12">Point 1.2</li>' ; ?>
 <?php  $str2 .=    '     <pre class="CodeDisplay">' ; ?>
 <?php  $str2 .=    '     some codes' ; ?>
 <?php  $str2 .=    '     </pre>' ; ?>
 <?php  $str2 .=    '    <li id="ref">Reference: <a href="link.html" target="_blank">link</a></li>' ; ?>
 <?php  $str2 .=    '  </ul>' ; ?>
 <?php  $str2 .=    '   </li> ' ; ?>
 <?php  $str2 .=    ' </ul>' ; ?>

 <?php

 function getTextBetweenTags($string, $tagname) {
     $pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
     preg_match($pattern, $string, $matches);
     return $matches[1];
     }

   $txt = getTextBetweenTags($str2, "div");
   echo $txt;
   ?>

   will output : -->  Point 1 

2 Comments

OP is using SimpleHTMLDom already. [insert "Regex to parse HTML is bad" comment here]
This is error prone advice. Regex is not DOM -aware.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.