0

How would I remove all of a an html input but comments? For example: This <html><body><!-- hello paragraph --><p>hello</p></body></html> Would turn into this: This <!-- hello paragraph -->

How would I do this? Thanks!

Edit: I know you can do stuff like this with regular expressions, But I don't know how.

3 Answers 3

1

Instead of replacing HTML, I'd extract all comments using:

preg_match_all('#(<!--.*?-->)#s', '<html><body><!-- hello paragraph --><p>hello</p></body></html>', $m);
Sign up to request clarification or add additional context in comments.

1 Comment

I believe ths approach would be more robust than trying to identify non-comments. It also wouldn't have the drawback that HTML tags inside comments would be mungled or removed.
0

That's indeed a bit more complex, but doable with regular expressions:

$text = preg_replace('~<(?!!--)/?\w[^>]*(?<!--)>~', "", $text);

This works on your example, but can fail for others. Amusingly it also removes HTML tags from within comments.

$regex = '~
    <             # opening html bracket
    (?!!--)       # negative assertion, no "!--" may follow
    /?\w          # tags must start with letter or optional /
    [^>]*         # matches html tag innards
    (?<!--)       # lookbehind assertion, no "--" before closing >
    >             # closing bracket
 ~x'

Comments

0
$foo="<html><body><!-- hello paragraph --><p>hello</p></body></html>";
preg_match('/(\<|<)!--(\s*.*?\s*)--(\>|>)/m',$foo,$result);
print_r($result);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.