0

I wrote a php script to fetch the email content.

These contents are HTML format.

I'd like to display the content, as below

<?php 
$email_content = '
    <html>
        <script>alert("XSS");</script>
        <body>
            <div>Line1</div>
            <div>Line2</div>
        </body>
    </html>
';
echo $email_content;
?>

As you can see, it will cause XSS attacks. But if I use htmlspecialchars function, it will not show the correct HTML format, how should I do in this case? Thanks.

4
  • Use something like htmlpurifier.org or take a look here stackoverflow.com/questions/7130867/… Commented Jun 20, 2013 at 7:37
  • Here I can see an error <script>alert('XSS');</script> . It should be <script>alert(\'XSS\');</script> Commented Jun 20, 2013 at 7:38
  • you can use htmlspecialchars_decode($str, ENT_NOQUOTES); after using htmlspecialchars()... Commented Jun 20, 2013 at 7:58
  • @RajeevRanjan Sorry, it doesn't work. Commented Jun 20, 2013 at 8:15

2 Answers 2

5

HTMLPurifer can do that:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

It takes dirty HTML (ie possibly containing Javascript) and removes any script.

PHP doesn't have anything native or built in that can remove Javacript like HTMLPurifier. You could use DOMDocument but this would be a lengthy task because Javascript can execute in some attributes (onerror, onclick) and is not just limited to <script></script>.

Sign up to request clarification or add additional context in comments.

4 Comments

I just tried it, not bad. But it removed unexpected content. The email content could be very complex but HTMLPurifer seems doesn't work stable on it.
What did it remove? HMTLPurifer has lots of config options to change what it does/doesn't remove. The default config might not be exactly what you want.
The signature line. There are no unsafe thing in it. The weird thing is some signature appear, some signature removed. It seems unstable. But maybe you're right, I need to dive into the config.
I figured out. My email content has multiple <html> tags, HTMLPurifer just fetch <div> content from <html> tag (Ref function: tokenizeHTML), so we need to explode it, and use purifyArray function.
1

You should use strip_tags() function and allow only tags that you want user to add.

echo strip_tags($text, '<p><a>');

This line allows <p> and <a> tags every other tag will be removed.

htmlspecialchars() works totally different.

From manual:

The translations performed are:

 '&' (ampersand) becomes '&amp;'
 '"' (double quote) becomes '&quot;' when ENT_NOQUOTES is not set.
 "'" (single quote) becomes '&#039;' (or &apos;) only when ENT_QUOTES is set.
 '<' (less than) becomes '&lt;'
 '>' (greater than) becomes '&gt;'

There is very nice article about XSS prevention and CSRF prenvetion read it.

4 Comments

If I need img tag, but they use image XSS attacks?
strip_tags() is not good enough because XSS can be present in attributes, it's not limited to <script></script>. Also in some circumstances strip_tags() can be bypassed.
@Cynial it's not XSS attack it's CSRF then. Read more about CSRF here stackoverflow.com/questions/1780687/preventing-csrf-in-php
@Robert that is wrong. an <img> tag can also trigger XSS. Example: <img src="not_existent.png" onerror="alert('XSS')">... Never output untrusted HTML. It's a battle you can hardly win. There are an endless amount of attack vectors.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.