7

I compose a large HTML file out of a huge unformatted text file. Now my fear is that the text file might contain some malicious JavaScript code. To avoid any damage I scan the text and replace any < or > with lt and gt. That is quite effective, but it's not really good for the performance.

Is there some tag or attribute or whatever that allows me to turn JavaScript off within the HTML file? In the header perhaps?

2
  • 2
    Where do the HTML come from? And how do you take it? You should tell us more so that we could help because there is probably some better solutions when you input the HTML code Commented Oct 28, 2011 at 10:46
  • I am creating the HTML myself. Actually it's a big table whose columns are filled with the data I extract from a text file. Therefore I do have control over the basic HTML file, just not what is within the columns. Commented Oct 28, 2011 at 11:46

5 Answers 5

4

Since you've considered replacing all < and > by the HTML entities, a good option would consist of sending the Content-Type: text/plain header.

If you include want to show the contents of the file, replacing every & by &amp; and every < by &lt; is sufficient to correctly display the contents of the file. Example:
Input: Huge wall of text 1<a2 &>1
Output: Huge wall of text 1&lt;a2 &amp;>1
Unmodified output, displaying in browser: Huge wall of text 11 (<..> interpreted as HTML)

If you cannot modify code at the back-end (server-side), you need a HTML parser, which sanitised your code. JavaScript is not the only threat, embedded content (<object>, <iframe>, ...) can also be very malicious. Have a look at the following answer for a very detailed HTML parser & sanitizer :
Can I load an entire HTML document into a document fragment in Internet Explorer?

Sign up to request clarification or add additional context in comments.

Comments

3

When you have a control of backend, you can provide file with header

Content-type: text/plain;

2 Comments

@Truth: the same result would happen anyway if the OP is replacing all tag delimiters with encoded entities.
Seeing as he wanted to get rid of malicious scripts, the correct solution would be to sanitize those.
1

No, you can't disable JavaScript from inside a webpage, rather, you should sanitize any and all input from your users to make sure no malicious scripts go through your script.

Whether it's by remove all script tags or replacing < and >, you need to make sure your input is clean.

Comments

1

Do a search for <script and replace with <!--<script and search for </script> and replace with </script>-->.

This should comment out all scripts in the file.

1 Comment

This is far from a complete solution - see owasp.org/www-community/xss-filter-evasion-cheatsheet for some of the ways that people could evade this.
0

you need a sandbox or clean html code. look phpids or html purifier.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.