29

I am trying to get the innerhtml of div tags in a file using nodeValue, however this code is outputting only plain text and seems to strip out all html tag from inside the div. How can I change this code to output the div's HTML content and not plain text, AND also output the main div wrapping it's child elements.

Example:

contents of file.txt:

<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>

script.php:

  $file= file_get_contents('file.txt');

    $doc = new DOMDocument();

    @$doc->loadHTML('<?xml encoding="UTF-8">'.$file); 

    $entries = $doc->getElementsByTagName('div');

        for ($i=0;$i<$entries->length;$i++) {
            $entry = $entries->item($i);
            echo $entry->nodeValue;
        }

outputs: text text texttext text texttext text text

what I need it to output:

<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>

Notice the parent div's (..etc) are needed to be outputted as well wrapping the span tags...

HELP!

2 Answers 2

42

I have never done what you're attempting to do, but as a stab in the dark, using the API docs, does echo $entry->textContent; work?

Adding an update. This is from the comments located on the docs page for DOMNode:

Hi!

Combining all th comments, the easiest way to get inner HTML of the node is to use this function:

<?php  function get_inner_html( $node ) { 
    $innerHTML= ''; 
    $children = $node->childNodes; 
    foreach ($children as $child) { 
        $innerHTML .= $child->ownerDocument->saveXML( $child ); 
    } 

    return $innerHTML;  }  ?>

Or, maybe a simpler method is just to do:

echo $domDocument->saveXML($entry);
Sign up to request clarification or add additional context in comments.

7 Comments

Sorry, should have included this in the post, but here is where I found this: php.net/manual/en/class.domnode.php textContent = "This attribute returns the text content of this node and its descendants."
Nope, this does the same thing as nodeValue
According to a comment in the docs, traversing is the best way to get the innerHTML. Let me know if that works for you.
I have not tested the code you posted but I found this function innerXML($node) { $doc = $node->ownerDocument; $frag = $doc->createDocumentFragment(); foreach ($node->childNodes as $child) { $frag->appendChild($child->cloneNode(TRUE)); } return $doc->saveXML($frag); } and it works. Thanks!
Cool, not sure I was much help, but hopefully, at the least, pointed you in the right direction.
|
16

Instead of:

echo $entry->nodeValue;

You have to use:

echo $doc->saveXML($entry);

Here is a more complete example that might help others too, $doccontent is the HTML block as a string:

$doccontent = '<html> …'; // your html string
$dom = new DOMDocument;
$internalErrors = libxml_use_internal_errors(true); // prevent error messages 
$content_utf = mb_convert_encoding($doccontent, 'HTML-ENTITIES', 'UTF-8'); // correct parsing of utf-8 chars
$dom->loadHTML($content_utf);
libxml_use_internal_errors($internalErrors); // prevent error messages 
$specialdiv = $dom->getElementById('xdiv');
if(isset($specialdiv))
{
    echo $dom->saveXML($specialdiv);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.