3

When loading HTML into an <textarea>, I intend to treat different kinds of links differently. Consider the following links:

  1. <a href="http://stackoverflow.com">http://stackoverflow.com</a>
  2. <a href="http://stackoverflow.com">StackOverflow</a>

When the text inside a link matches its href attribute, I want to remove the HTML, otherwise the HTML remains unchanged.

Here's my code:

$body = "Some HTML with a <a href=\"http://stackoverflow.com\">http://stackoverflow.com</a>";

$dom = new DOMDocument;
$dom->loadHTML($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

foreach ($dom->getElementsByTagName('a') as $node) {
    $link_text = $node->ownerDocument->saveHTML($node->childNodes[0]);
    $link_href = $node->getAttribute("href");
    $link_node = $dom->createTextNode($link_href);

    $node->parentNode->replaceChild($link_node, $node);
}

$html = $dom->saveHTML();

The problem with the above code is that DOMDocument encapsulates my HTML into a paragraph tag:

<p>Some HTML with a http://stackoverflow.com</p>

How do I get it ot only return the inner HTML of that paragraph?

2
  • DOMDocument may have a rootNode to work. It creates one if there is no one. You should add a root node before to parse, and remove it manually... Hope there is a better solution. Commented Feb 22, 2018 at 14:29
  • It makes sense that there needs to be a rootNode. In that case, there might be no way around preg_replace('/(^<p>|<\/p>$)/', '', $html) Commented Feb 22, 2018 at 14:48

1 Answer 1

1

You need to have a root node to have a valid DOM document.

I suggest you to add a root node <div> to avoid to destroy a possibly existing one.

Finally, load the nodeValue of the rootNode or substr().

$body = "Some HTML with a <a href=\"http://stackoverflow.com\">http://stackoverflow.com</a>";
$body = '<div>'.$body.'</div>';

$dom = new DOMDocument;
$dom->loadHTML($body, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

foreach ($dom->getElementsByTagName('a') as $node) {
    $link_text = $node->ownerDocument->saveHTML($node->childNodes[0]);
    $link_href = $node->getAttribute("href");
    $link_node = $dom->createTextNode($link_href);

    $node->parentNode->replaceChild($link_node, $node);
}

// or probably better :
$html = $dom->saveHTML() ;
$html = substr($html,5,-7); // remove <div>
var_dump($html); // "Some HTML with a http://stackoverflow.com"

This works is the input string is :

<p>Some HTML with a <a href=\"http://stackoverflow.com\">http://stackoverflow.com</a></p>

outputs :

<p>Some HTML with a http://stackoverflow.com</p>
Sign up to request clarification or add additional context in comments.

3 Comments

I would have preferred if there's a DOMDocument way to retrieve the child node. However, I need to preserve some HTML (including some links) and your first method strips all HTML.
@idleberg I understand. So I still suggest you to add a root tag, even if there is one, because, you could delete an existing possible one.
@idleberg I've updated the anwser. Please, see also the last part.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.