I need to check if user submitted HTML contains any JavaScript. I'm using PHP for validation.
-
Its maybe a better idea to define an accepted subset of html (tags, attributes and contents) and strip anything that is not accepted. It saves some nasty surprises.Toon Krijthe– Toon Krijthe2009-10-01 10:36:35 +00:00Commented Oct 1, 2009 at 10:36
-
Do you absolutely need to detect the presence of javascript code, or just make sure the code is sanitized/made non-executable?code_burgar– code_burgar2009-10-01 10:37:11 +00:00Commented Oct 1, 2009 at 10:37
-
Ideally, if it contains any sort of javascript I would throw an error up to the user who submitted the HTML at the submission stage and never submit it.David– David2009-10-01 12:08:56 +00:00Commented Oct 1, 2009 at 12:08
5 Answers
If you want to protect yourself against Cross-Site Scripting (XSS), then you should better use a whitelist than a blacklist. Because there are too many aspects you need to consider when looking for XSS attacks.
Just make a list of all HTML tags and attributes you want to allow and remove/escape all other tags/attributes. And for those attributes that can be used for XSS attacks, validate the values to only allow harmless values.
Comments
It might be better to take a different approach and use something like HTML Purifier to filter out anything that you don't want. I think it would be very difficult to safely remove any possibility of javascript without actually parsing the HTML properly.
1 Comment
You could remove the script tags as Pawka states using regular expressions. I found a thread on this here.
Basically it's:
$list=preg_replace('#<script[^>]*>.*?</script>#is','',$list);
Code is from that page, not written by me.
3 Comments
You'll need to scan for <script> tags but you'll also need to scan for attributes like onclick="" or onmouseover="" etc... that can have javascript without the need for the script tags.
1 Comment
<img src="javascript:whatever">