2

Using this regex expression:

preg_replace( '/<!--(?!<!)[^\[>].*?-->/', '', $output )

I'm able to remove all HTML comments from my page except for anything that looks like this:

<!--[if IE 6]>
    Special instructions for IE 6 here
<![endif]-->

How can I modify this to also exclude HTML comments which include a unique phrase, such as "batcache"?

So, an HTML comment this:

<!--
generated 37 seconds ago
generated in 0.978 seconds
served from batcache in 0.004 seconds
expires in 263 seconds
-->

Won't be removed.


This code seems to do the trick:

preg_replace( '/<!--([\s\S]*?)-->/', function( $c ) { return ( strpos( $c[1], '<![' ) !== false || strpos( $c[1], 'batcache' ) !== false ) ? $c[0] : ''; }, $output )
2
  • why don't you use strip_tags? and add back the special conditional comments? Commented Feb 11, 2015 at 19:55
  • 2
    Don't use regular expressions to parse HTML. Use a proper HTML parsing module. You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See htmlparsing.com/php or this SO thread for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. Commented Feb 11, 2015 at 19:56

1 Answer 1

2

This should replace alle the comments which doesn't contain "batcache". The matching is done between this two tags: <!-- to --> .

$result = preg_replace("/<!--((?!batcache)(?!\\[endif\\])[\\s\\S])*?-->/", "", $str);

You can test it here.

As already stated by other users it's not always safe to parse HTML with regex but if you have a relative assurance of what kind of HTML you will parse it should work as expected. If the regex doesn't match some particular usecase let me know.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks man that's nearly exactly what I was looking for, but what happened to the conditional comment exceptions? I updated my question to show the code I got working. Also, I totally understand what @AndyLester was saying about regex parsing, but in this case—with a unique, unchanging condition—I would think it's OK.
I'm sorry, I misread the question. I thought you wanted to replace all the tags except for the ones containing batcache. I have modified the answer accordingly. In case you need more matches to exclude I think you can add another negative lookahead to the list in the format "(?!string)".
Maybe [endif] it's not perfect, you can replace it with <![ as in your solution if you prefer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.