UPDATE (Feb 2020):
Microsoft's AntiXSS library contains a static method called GetSafeHtmlFragment on its Sanitizer class, which seems to do the business. (suggested by @exploring.cheerily.impresses)
In .NET 4.5+ or by adding System.Web.Security.AntiXss to the older version of .NET, there is a good way to address this issue. We can use [AllowHtml] and a custom annotation attribute together. The approach should whitelist the HTML tags inside the string and validate the request.
Here is the custom annotation attribute for this job:
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field, Inherited = true, AllowMultiple = false)]
public sealed class RemoveScriptAttribute : ValidationAttribute
{
public const string DefaultRegexPattern = @"\<((?=(?!\b(a|b|i|p)\b))(?=(?!\/\b(a|b|i|p)\b))).*?\>";
public string RegexPattern { get; }
public RemoveScriptAttribute(string regexPattern = null)
{
RegexPattern = regexPattern ?? DefaultRegexPattern;
}
protected override ValidationResult IsValid(object value, ValidationContext ctx)
{
var valueStr = value as string;
if (valueStr != null)
{
var newVal = Regex.Replace(valueStr, RegexPattern, "", RegexOptions.IgnoreCase, new TimeSpan(0, 0, 0, 0, 250));
if (newVal != valueStr)
{
var prop = ctx.ObjectType.GetProperty(ctx.MemberName);
prop.SetValue(ctx.ObjectInstance, newVal);
}
}
return null;
}
}
Then you should decorate the model property that you want HTML in it with both [AllowHtml] and [RemoveScript] attribute, like this:
public class MyModel
{
[AllowHtml, RemoveScript]
public string StringProperty { get; set; }
}
This will allow only <a>, <b>, <i>, and <p> html tags to get it. All other tags will be removed, however it is smart enough to keep the inner text of the tags. E.g. if you send:
"This is a <b>rich text<b> entered by <u>John Smith</u>."
you will end up getting this:
"This is a <b>rich text<b> entered by John Smith."
It is also easy to whitelist more HTML tags. E.g. if you want to accept <u></u>,
<br />, and <hr />, change the DefaultRegexPattern (affects globally) or pass a modified regexPattern to an instance of RemoveScriptAttribute, like this:
[AllowHtml]
[RemoveScript(regexPattern: @"\<((?=(?!\b(a|b|i|p|u|br|hr)\b))(?=(?!\/\b(a|b|i|p|u)\b))).*?\>")]
public string Body { get; set; }