2
public static string MakeWebSafe(this string x) {
    const string RegexRemove = @"(<\s*script[^>]*>)|(<\s*/\s*script[^>]*>)";
    return Regex.Replace(x, RegexRemove, string.Empty, RegexOptions.IgnoreCase);
}

Is there any reason this implementation isn't good enough. Can you break it? Is there anything I haven't considered? If you use or have used something different, what are its advantages?

I'm aware this leaves the body of the script in the text, but that's okay for this project.

UPDATE

Don't do the above! I went with this in the end: HTML Agility Pack strip tags NOT IN whitelist.

4
  • 1
    Instead of even trying to write a foolproof script and leaving open the possibility that you failed, why not just use an HTML parser like HTML Agility Pack? Commented Jun 13, 2011 at 15:09
  • Don't reinvent the wheel. Use a proven, solid security library. See Chris's answer. Commented Jun 13, 2011 at 15:10
  • If/when I get time, I will come back and improve it, I just need a quick-and-dirty solution in place for the time being. Commented Jun 13, 2011 at 15:19
  • quick and dirty? It takes less than 5 minutes to download a library, add the reference to your project, and write a line or two that removes <script> elements from a string. Commented Jun 13, 2011 at 15:22

2 Answers 2

4

Have you considered this kind of scenario??

<scri<script>pt type="text/javascript">
    causehavoc();
</scr</script>ipt>

The best thing to do is remove all tags, encode things, or use bbcode

Sign up to request clarification or add additional context in comments.

2 Comments

The only thing I want to prevent is any script actually running. It doesn't matter if its content or other leftovers survive for the project I am working on.
This is the point, you strip out the bits you want, the script will then get run as you only removed the inner script tags, not the invalid ones, which become valid one you remove the inner tags.
2

Yes, your RegEx can be circumvented by unicode encoding the script tags. I would suggest you look to more robust libraries when it comes to security. Take a look at Microsoft Web Protection Library

2 Comments

Very good point. Since this is an internal tool, I only need to implement the most basic solution and get something up and running. This is beyond the skill of anyone that is going to use the tool (not the greatest of excuses I know, but I'm hoping I get time to come back and improve it later).
It would be safer to HtmlEncode the strings on the way out, that way none of the markup will execute. Take a look at the WPL, it is really easy to integrate and has some nice tools to get safe markup, which will allow some markup through that is considered safe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.