30

On the one hand if I have

<script>
var s = 'Hello </script>';
console.log(s);
</script>

the browser will terminate the <script> block early and basically I get the page screwed up.

On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script> block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output

Hello &lt;/script&gt;

which is not desired behavior in this case.

One way of properly escaping JS strings within a <script> block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.

var s = 'Hello <\/script>';

This seems to be working fine.

Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.

<div onClick="alert('Hello ">')"></div>

looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.

My question is: what is the best/standard practice for properly covering all the situations above - i.e. JS within a script block, JS within event handlers - if your JS code can partly be generated on the server side and can potentially contain malicious data?

1

5 Answers 5

45

The following characters could interfere with an HTML or Javascript parser and should be escaped in string literals: <, >, ", ', \, and &.

In a script block using the escape character, as you found out, works. The concatenation method (</scr' + 'ipt>') can be hard to read.

var s = 'Hello <\/script>';

For inline Javascript in HTML, you can use entities:

<div onClick="alert('Hello &quot;>')">click me</div>

Demo: http://jsfiddle.net/ThinkingStiff/67RZH/

The method that works in both <script> blocks and inline Javascript is \uxxxx, where xxxx is the hexadecimal character code.

  • < - \u003c
  • > - \u003e
  • " - \u0022
  • ' - \u0027
  • \ - \u005c
  • & - \u0026

Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/

HTML:

<div onClick="alert('Hello \u0022>')">click me</div>

<script>
    var s = 'Hello \u003c/script\u003e';
alert( s );
</script>   
Sign up to request clarification or add additional context in comments.

2 Comments

The hex escape method is the best so far: you don't have to worry where your string ends up in the code, just send everything through one basic server-side function. Great, I like it!
Shouldn't newline - \u000a be on that list as well?
2

I'd say the best practice would be avoiding inline JS in the first place.

Put the JS code in a separate file and include it with the src attribute

<script src="path/to/file.js"></script>

and use it to set event handlers from the inside isntead of putting those in the HTML.

//jquery example
$('div.something').on('click', function(){
    alert('Hello>');
})

2 Comments

And what if I have my reasons for using inline code? For efficiency, saving traffic, connections, etc. on a highly loaded web site.
@mojuba: Well, by the time you get to this kind of performance tuning most best practices have already been thrown out the window :)
2

(edit - somehow didn't notice you mentioned slash-escape in your question already...)

OK so you know how to escape a slash.

In inline event handlers, you can't use the bounding character inside a literal, so use the other one:

<div onClick='alert("Hello \"")'>test</div>

But this is all in aid of making your life difficult. Just don't use inline event handlers! Or if you absolutely must, then have them call a function defined elsewhere.

Generally speaking, there are few reasons for your server-side code to be writing javascript. Don't generate scripts from the server - pass data to pre-written scripts instead.

(original)

You can escape anything in a JS string literal with a backslash (that is not otherwise a special escape character):

var s = 'Hello <\/script>';

This also has the positive effect of causing it to not be interpreted as html. So you could do a blanket replace of "/" with "\/" to no ill effect.

Generally, though, I am concerned that you would have user-submitted data embedded as a string literal in javascript. Are you generating javascript code on the server? Why not just pass data as JSON or an HTML "data" attribute or something instead?

5 Comments

Re: passing strings to JS: it's a valid point to use, say, JSON instead, but I'm trying to save some traffic and connections by inserting data directly into HTML/JS. For small amounts of data I think it's OK.
This technique can only cost you in terms of bandwidth, since such scripts cannot be cached by the browser. Quick and dirty, stick it in a hidden element: <span style="display:none;" id="mule" data-text="... attributed encoded text or JSON structure"></span> There's no rule against doing it however you want, but it sure saves a lot of headaches and makes for easier, more secure, more maintainable code to avoid generating scripts.
Re: your solution with reverting the bounding characters will require my server-side code to look for quotes within my JS snippet and decide whether it should be enclosed in single or double quotes. Getting too complicated. Far easier to just escape everything like any HTML literal text.
Except it won't work reliably because javascript isn't a literal. You need to combine the rules for escaping within javascript literals, and the rules for escaping within an HTML element, which is pretty darn complicated all of the sudden. A double-quote inside single-quotes becomes &quot; but what about a double-quote that's bounding a string literal? Answer is simple: avoid inline scripts. Pass data instead.
To be honest, I already fixed my code and it works. Rule #1: when generating a JS string literal on the server, escape quotes, newlines and slash with backslash. Rule #2: when inserting anything into HTML other than JS code in the script block, escape as usual with htmlentities().
2

Here's how I do it:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

var myString='Encode HTML entities!\n"Safe" escape <script></'+'script> & other tags!';

test.value=encode(myString);

testing.innerHTML=encode(myString);

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

Comments

-2

Most people use this trick:

var s = 'Hello </scr' + 'ipt>';

1 Comment

So if the code is generated on the server side, I need to look for <script> and replace it with the broken one? Isn't it easier to just escape the slashes?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.