4

I'm loading some HTML via Ajax with this format:

<div id="div1">
  ... some content ...
</div>
<div id="div2">
  ...some content...
</div>
... etc.

I need to iterate over each div in the response and handle it separately. Having a separate string for the HTML content of each div mapped to the id would satisfy my requirements. However, the divs may contain script tags, which I need to preserve but not execute (they'll execute later when I stick the HTML into the document, so executing during parsing would be bad). My first thought was to do something like this:

// data being the result from $.get
var clean = data.replace(/<script.*?</script>/,function() {
    // insert some unique token, save the tag, put it back while I'm processing
}); 

$('<div/>').html(clean).children().each( /* ... process here ... */);

But I worry that some stupid dev is going to come along and put something like this in one of the divs:

<script> var foo = '</script>'; // ... </script>

Which would screw it all up. Not to mention, the whole thing feels like a hack to begin with. Does anyone know a better way?

EDIT: Here's the solution I've come up with:

var divSplitRegex = /(?:^|<\/div>)\s*<div\s+id="prefix-(.+?)">/g,
    idReplacement = preDelimeter+'$1'+postDelimeter;
var r = data.replace(<\/div>\s*$/,'').
    replace(divSplitRegex,idReplacement).split(preDelimeter);
$.each(r,function() {
    var content;
    if(this) {
        callback.apply(null,this.split(postDelimeter));
    }
});

Where preDelimiter and postDelimeter are just unique strings like "###I'd have to be an idiot to embed this string in my content unescaped because it would break everything###', and callback is a function expecting the div id and the div content. This only works because I know that the divs will have only an id atribute, and the id will have a special prefix. I suppose someone could put a div in their content with an id having the same prefix and it would screw things up too.

So, I still don't love this solution. Anyone have a better one?

2 Answers 2

3

FYI, Using unescaped in any JavaScript script causes this issue in a browser. Developers have to escape it anyway so there is no excuse. So you can "trust" that would break in any case.

<body>
 <div>
   <script>
     alert('<script> tags </script> are not '+
         'valid in regular old HTML without being escaped.');
   </script>
</body>

See

http://jsbin.com/itevu

to see it break. :)

Sign up to request clarification or add additional context in comments.

1 Comment

I guess that means my first solution will be safe. I don't love it, but it works.
2

In some cases removing script tags results in invalid html:

 <html>
    <head>
    </head>
    <body>
        <p>This should be
        <script type="text/javascript">
            document.writeln("<b");
        </script>>bolded</b>.
    </body>
 </html>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.