XSS - Which HTML Tags and Attributes can trigger Javascript Events?

Question

I'm trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument. In order to avoid unnecessary complexity I am willing to make the following compromises:

HTML comments are removed
script and style tags are stripped all together
only the child nodes of the body tag will be returned
all HTML attributes that can trigger Javascript events will either be validated or removed

I've been reading a lot about on XSS attacks and prevention and I hope I'm not being too naive (if I am, please let me know!) in assuming that if I follow all the rules I mentioned above, I will be safe from XSS.

The problem is I am not sure what other tags and attributes (in any [X]HTML version and/or browser versions/implementations) can trigger Javascript events, besides the default Javascript event attributes:

onAbort
onBlur
onChange
onClick
onDblClick
onDragDrop
onError
onFocus
onKeyDown
onKeyPress
onKeyUp
onLoad
onMouseDown
onMouseMove
onMouseOut
onMouseOver
onMouseUp
onMove
onReset
onResize
onSelect
onSubmit
onUnload

Are there any other non-default or proprietary event attributes that can trigger Javascript (or VBScript, etc...) events or code execution? I can think of href, style and action, for instance:

<a href="javascript:alert(document.location);">XSS</a> // or
<b style="width: expression(alert(document.location));">XSS</b> // or
<form action="javascript:alert(document.location);"><input type="submit" /></form>

I will probably just remove any style attributes in the HTML tags, the action and href attributes pose a bigger challenge but I think the following code is enough to make sure their value is either a relative or absolute URL and not some nasty Javascript code:

$value = $attribute->value;

if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0))
{
    $node->removeAttributeNode($attribute);
}

So, my two obvious questions are:

Am I missing any tags or attributes that can trigger events?
Is there any attack vector that is not covered by these rules?

After a lot of testing, pondering and researching I've come up with the following (rather simple) implementation which, appears to be immune to any XSS attack vector I could throw at it.

I highly appreciate all your valuable answers, thanks.

Your checking for an URI might be fooled if a browser supports malformed urls like http:jascript:alert(.... — hakre
– hakre, Commented Aug 7, 2011 at 22:04
well, there are many possibale variations on evaluation of javascript, like encode and decode. eval, external javascript file and so on ... basically, there is no known method that will prevent user from doing bad. You can try to escape tags, words, quotes but it still can be possible to inject xss through interesting methods. I would suggest reading WhiteHat security for this issue, maybe you can find something usefull ? — Igoris
– Igoris, Commented Aug 7, 2011 at 22:04
That does not sound "white-list based". A whitelist-based approach would be to only copy tags and attributes that you know to be harmless. You don't need a list of harm_ful_ attributes for that. — hmakholm left over Monica
– hmakholm left over Monica, Commented Aug 7, 2011 at 22:08
@hakre: As long as no Javascript is executed I don't really care if the link is broken. From my limited tests (and I don't have a plethora of OS, browsers) that snippet (and some other variations) won't work. — Alix Axel
– Alix Axel, Commented Aug 7, 2011 at 22:08
@Henning: I should have made it more clear... Tags must always be white-listed (script and style will always be removed however). Tag attributes can be white-listed or not (allow all attributes, which should be internally sanitized or black-listed). If you allow the a tag, you probably also need to allow the href attribute and you still have the same problem - that's why I though on a second-pass black-list approach, since white-listing all possible tag attribute values would be way too cumbersome and highly susceptible to human error. — Alix Axel
– Alix Axel, Commented Aug 7, 2011 at 22:15

Mike Samuel · Accepted Answer · 2011-08-18 02:44:00Z

11

You mention href and action as places javascript: URLs can appear, but you're missing the src attribute among a bunch of other URL loading attributes.

Line 399 of the OWASP Java HTMLPolicyBuilder is the definition of URL attributes in a white-listing HTML sanitizer.

private static final Set<String> URL_ATTRIBUTE_NAMES = ImmutableSet.of(
  "action", "archive", "background", "cite", "classid", "codebase", "data",
  "dsync", "formaction", "href", "icon", "longdesc", "manifest", "poster",
  "profile", "src", "usemap");

The HTML5 Index contains a summary of attribute types. It doesn't mention some conditional things like <input type=URL value=...> but if you scan that list for valid URL and friends, you should get a decent idea of what HTML5 adds. The set of HTML 4 attributes with type %URI is also informative.

Your protocol whitelist looks very similar to the OWASP sanitizer one. The addition of ftp and sftp looks innocuous enough.

A good source of security related schema info for HTML element and attributes is the Caja JSON whitelists which are used by the Caja JS HTML sanitizer.

How are you planning on rendering the resulting DOM? If you're not careful, then even if you strip out all the <script> elements, an attacker might get a buggy renderer to produce content that a browser interprets as containing a <script> element. Consider the valid HTML that does not contain a script element.

<textarea><&#47;textarea><script>alert(1337)</script></textarea>

A buggy renderer might output the contents of this as:

<textarea></textarea><script>alert(1337)</script></textarea>

which does contain a script element.

(Full disclosure: I wrote chunks of both HTML sanitizers mentioned above.)

edited Aug 18, 2011 at 2:44

answered Aug 7, 2011 at 22:32

Mike Samuel

121k30 gold badges230 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Alix Axel Over a year ago

Your answer is gold, +10 if I could! I looked into the OWASP PHP Sanitizer but there was only an abstract class in there, the Java version however, seems very complete. I'm not exactly sure how to test if Javascript is being executed inside some of the attributes you mentioned, if you happen to know please let me know. The Google Caja whitelists are also pretty neat.

Alix Axel Over a year ago

Oh, as for the DOM rendering, my code is still pretty edgy but I was using the PHP built-in strip_tags() with the white-listed tags (to take some work out of the dom extension) and then traversing the remaining HTML nodes with DOMDocument removing the tag or attribute nodes if they weren't white-listed or were permanently black-listed. I think I've spotted some bugs in strip_tags() so I'll rewrite the whole thing using only the dom extension. I'm still testing if I need to use tidy before, or if dom is smart enough alone.

Alix Axel Over a year ago

my.opera.com/karlcow/blog/…

Alix Axel Over a year ago

stackoverflow.com/questions/2725156/…

Alix Axel Over a year ago

Sorry for posting all these links but I have a huge number of tabs opened ATM and they might interest someone. I "searched" the Caja attribute whitelist and did a little Googling and apparently the OWASP project doesn't contemplate the following (new) attributes: profile, manifest, poster, formaction, icon and longdesc. Also, I was unable to find any reference to the dsync attribute you mentioned, do you happen to have any?

|

Explosion Pills · Accepted Answer · 2011-08-08 02:20:04Z

5

Garuda has already given what I would deem as the "correct" answer, and his links are very useful, but he beat me to the punch!

I give my answer only to reinforce.

In this day and age of increasing features in the html and ecmascript specs, avoiding script injection and other such vulnerabilities in html becomes more and more difficult. With each new addition, a whole world of possible injections is introduced. This is coupled with the fact that different browsers probably have different ideas of how they are going to implement these specs, so you get even more possible vulnerabilities.

Take a look at a short list of vectors introduced by html 5

The best solution is choose what you will allow rather than what you will deny. It is much easier to say "These tags and these attributes for those given tags alone are allowed. Everything else will sanitized accordingly or thrown out."

It would be very irresponsible for me to compile a list and say "okay, here you go: here's a list of all of the injection vectors you missed. You can sleep easy." In fact, there are probably many injection vectors that are not even known by black hats or white hats. As the ha.ckers website states, script injection is really only limited by the mind.

I'd like to answer your specific question at least a little bit, so here are some glaring omissions from your blacklist:

img src attribute. I think it is important to note that src is a valid attribute on other elements and could be potentially harmful. img also dynsrc and lowsrc, maybe even more.
type and language attributes
CDATA in addition to just html comments.
Improperly sanitized input values. This may not be a problem depending upon how strict your html parsing is.
Any ambiguous special characters. In my opinion, even unambiguous ones should probably be encoded.
Missing or incorrect quotes on attributes (such as grave quotes).
Premature closing of textarea tags.
UTF-8 (and 7) encoded characters in scripts
Even though you will only return child nodes of the body tag, many browsers will still evaluate head, and html elements inside of body, and most head-only elements inside of body anyway, so this probably won't help much.
In addition to css expressions, background image expressions
frames and iframes
embed and probably object and applet
Server side includes
PHP tags
Any other injections (SQL Injection, executable injection, etc.)

By the way, I'm sure this doesn't matter, but camelCased attributes are invalid xhtml and should be lower cased. I'm sure this doesn't affect you.

edited Aug 8, 2011 at 2:20

answered Aug 7, 2011 at 22:45

Explosion Pills

192k56 gold badges341 silver badges417 bronze badges

3 Comments

Alix Axel Over a year ago

Good answer, +1. The link you mention relies on the onfocus attribute, if that is removed the new autofocus attribute poses no danger by itself, so I don't consider that a new attack vector, it only "empowers" a existing one.

Alix Axel Over a year ago

As for the points you mentioned, some of them can be a problem, but then again I use a white-list based approach - I'm just trying to compile a black-list that will be always used internally (i.e. not configurable). The src attribute is specially worrying, I'll test everything you mentioned and see how it goes.

Mike Samuel Over a year ago

When you say "xss expression" did you mean "CSS expression" as in <div style="height: expression(alert(1337))"> for IE and moz-binding and similar for other browsers?

Garuda · Accepted Answer · 2011-08-07 22:11:54Z

2

You might want to check these 2 links out for additional reference:

http://adamcecc.blogspot.com/2011/01/javascript.html (this is only applicable when you're 'filtered' input is ever going to find itself between script tags on a page)

http://ha.ckers.org/xss.html (which has a lot of browser-specific event triggers listed)

I've used HTML Purifier, as you are doing, for this reason too in combination with a wysiwyg-editor. What i did different is using a very strict whitelist with a couple of basic markup tags and attributes available and expanding it when the need arose. This keeps you from getting attacked by very obscure vectors (like the first link above) and you can dig in on the newly needed tag/attribute one by one.

Just my 2 cents..

answered Aug 7, 2011 at 22:11

Garuda

3741 silver badge5 bronze badges

3 Comments

Alix Axel Over a year ago

Thanks, I was aware of the XSS Cheat Sheet. The first link looks interesting, however the poster sums exactly what I am trying to avoid in a single sentence: "That's right this is an alert() if it lands anywhere in an executable section of JavaScript/dom it pops up the cookie". HTML Purifier is the de facto standard for PHP HTML sanitization but it's also heavy as hell, I'm hoping that if a make some small compromises I will be able to come up with something that is much more lightweight.

Garuda Over a year ago

You're absolutely right on the heavyweight overkill :) Allthough i have yet to experience obvious delays in page loading due to the sanitizing by HTML-purifier. Since its a pretty well developed framework im pretty sure the developers are aware of the performance issues and they most likely made it as lightweight and fast as possible. If its still a problem for you personally i would like to recommend you looking into PHP code caching (like APC) or from the Zend engine.

mpgn Over a year ago

all page not found

Pierre Ernst · Accepted Answer · 2011-08-08 12:48:38Z

0

Don't forget the HTML5 JavaScript event handlers

http://www.w3schools.com/html5/html5_ref_eventattributes.asp

answered Aug 8, 2011 at 12:48

Pierre Ernst

5243 silver badges7 bronze badges

2 Comments

Alix Axel Over a year ago

+1, I didn't knew about those new attributes. I checked, and it seems that the only attributes that start with on are all Javascript event triggers. I will probably just remove all that match that pattern.

molnarg Over a year ago

Please don't reference w3schools, use more reputable sources. See w3fools.com

Collectives™ on Stack Overflow

XSS - Which HTML Tags and Attributes can trigger Javascript Events?

4 Answers 4

6 Comments

3 Comments

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

3 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related