Technically, "html comments" between script tags are no more html comments. If you use a DOM approach these comments are not selected:
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$comments = $xp->query('//comment()');
foreach ($comments as $comment) {
$comment->parentNode->removeChild($comment);
}
$result = $dom->saveHTML();
About conditional comments:
If you want to preserve conditional comments, you need to check the beginning of the comment. You can do it in two ways.
The first way is to check the comment in the foreach loop, and when the test is negative, you remove the node.
But since you use the XPath way (that consists to select what you want once and for all), to follow the same logic, you can change the XPath query to:
//comment()[not(starts-with(., "[if") or starts-with(., "[endif]"))]
Content between square brackets is called a "predicate" (a condition for the current element) and the dot represents this current element or its text content (depending of the context)
However, if this will work most of the time, the slightest leading space will make it fail. You need something more flexible than starts-with.
It is possible to register your own php function to be used in the XPath query like this:
function isConditionalComment($commentNode) {
return preg_match('~\A(?:\[if\s|\s*<!\[endif])~', $commentNode[0]->nodeValue);
}
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$xp->registerNamespace('php', 'http://php.net/xpath');
$xp->registerPHPFunctions('isConditionalComment');
$comments = $xp->query('//comment()[not(php:function("isConditionalComment", .))]');
foreach ($comments as $comment) {
$comment->parentNode->removeChild($comment);
}
Note: DOMDocument doesn't support the default Microsoft syntax (the one nobody uses) that is not an HTML comment:
<![if !IE]>
<link href="non-ie.css" rel="stylesheet">
<![endif]>
This syntax causes a warning (since it is not HTML) and the "tag" is ignored and disappear from the DOM tree.