0

I would like to allow users to post HTML to a site but need to ensure that no Javascript is injected into the site.

So far I have created a validation attribute to check the incoming html for dodgy doings

[AttributeUsage(AttributeTargets.Property, 
    AllowMultiple = false, Inherited = true)]
public class CheckHtml : ValidationAttribute, IMetadataAware {

    private static Regex _check = new Regex(
        @"<script[^>]*>.*?<\/script>|<[^>]*(click|mousedown|mouseup|mousemove|keypress|keydown|keyup)[^>]*>",
        RegexOptions.Singleline|RegexOptions.IgnoreCase|RegexOptions.Compiled);

    protected override ValidationResult IsValid(
        object value, ValidationContext validationContext) {

        if(value!=null
            && _check.IsMatch(value.ToString())){

            return new ValidationResult("Content is not acceptable");
        }

        return ValidationResult.Success;
    }

    /// <summary>
    /// <para>Allow Html</para>
    /// </summary>
    public void OnMetadataCreated(ModelMetadata metadata) {
        if (metadata == null) {
            throw new ArgumentNullException("metadata");
        }
        metadata.RequestValidationEnabled = false;
    }
}

Is this going to be enough? What do you do to check for such naughtyness?

3 Answers 3

3

Take a look at the Microsoft AntiXSS library. It boasts a AntiXSS.GetSafeHtmlFragment() method which returns the HTML stripped of all the XSS-badness.

As David has pointed out, a white list is always the way to go. AntiXSS uses a whitelist of HTML elements/attributes that are safe against XSS / filters out JavaScript.

Sign up to request clarification or add additional context in comments.

Comments

2

Is this going to be enough?

No. It is a blacklist. A blacklist is never enough.

No. It is a regular expression. Regular expressions are rubbish at dealing with arbitrary HTML.

What do you do to check for such naughtyness?

A proper HTML parser combined with a whitelist.

Comments

0

Jeff Atwood had a nice discussion of this topic over on refactor my code. Definitely worth the time to check it out: http://refactormycode.com/codes/333-sanitize-html

The final refactored version should be pretty solid. Security is never a 100% type of thing but this is probably better than most other examples floating around.

1 Comment

thanks, the white-list would have to be updated for html5, but that's a great start

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.