14

I have written some code that takes a string of html and cleans away any ugly HTML from it using jQuery (see an early prototype in this SO question). It works pretty well, but I stumbled on an issue:

When using .append() to wrap the html in a div, all script elements in the code are evaluated and run (see this SO answer for an explanation why this happens). I don't want this, I really just want them to be removed, but I can handle that later myself as long as they are not run.

I am using this code:

var wrapper = $('<div/>').append($(html));

I tried to do it this way instead:

var wrapper = $('<div>' + html + '</div>');

But that just brings forth the "Access denied" error in IE that the append() function fixes (see the answer I referenced above).

I think I might be able to rewrite my code to not require a wrapper around the html, but I am not sure, and I'd like to know if it is possible to append html without running scripts in it, anyway.

My questions:

  • How do I wrap a piece of unknown html without running scripts inside it, preferably removing them altogether?

  • Should I throw jQuery out the window and do this with plain JavaScript and DOM manipulation instead? Would that help?

What I am not trying to do:

I am not trying to put some kind of security layer on the client side. I am very much aware that it would be pointless.

Update: James' suggestion

James suggested that I should filter out the script elements, but look at these two examples (the original first and the James' suggestion):

jQuery("<p/>").append("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there")

keeps the text nodes but writes gnu!

jQuery("<p/>").append(jQuery("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there").not('script'))`

Doesn't write gnu!, but also loses the text nodes.

Update 2:

James has updated his answer and I have accepted it. See my latest comment to his answer, though.

3 Answers 3

10

How about removing the scripts first?

var wrapper = $('<div/>').append($(html).not('script'));

  • Create the div container
  • Use plain JS to put html into div
  • Remove all script elements in the div

Assuming script elements in the html are not nested in other elements:

var wrapper = document.createElement('div');
wrapper.innerHTML = html;
$(wrapper).children().remove('script');

var wrapper = document.createElement('div');
wrapper.innerHTML = html;
$(wrapper).find('script').remove();

This works for the case where html is just text and where html has text outside any elements.

Sign up to request clarification or add additional context in comments.

7 Comments

Good idea, and it almost works, but not quite. The string that I, somewhat carelessly, call html, might contain text outside of tags, and I want that too. See my updated answer for examples.
Additionally, the "html" might be just text, with no tags, and then jQuery will treat it as a selector if it is sent to $()/jQuery(), according to the docs: api.jquery.com/jQuery/#jQuery2
And the not() only works if script is not wrapped inside another element.
This is promising! With a little modification it seems to work for nested scripts too: jQuery(wrapper).find('script').remove(). It works in Chrome, now I'll try it in IE (cringe)
The proof of concept works in IE too. Now I'll add it to my real code.
|
0

You should remove the script elements:

var wrapper = $('<div/>').append($(html).remove("script"));

Second attempt:

node-validator can be used in the browser: https://github.com/chriso/node-validator

var str = sanitize(large_input_str).xss();

Alternatively, PHPJS has a strip_tags function (regex/evil based): http://phpjs.org/functions/strip_tags:535

4 Comments

Try this code: jQuery("<div/>").append(jQuery("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there").remove('script')). It still runs the script tag, actually, and removes just one of the text nodes... Interesting...
Yeah, the script is executed on $(html)… It seems the solution would be to remove the script elements with a regex?
Yes, you need a real parser, something BIG like json2.js but for removing script tags :-)
0

The scripts in the html kept executing for me with all the simple methods mentioned here, then I remembered jquery has a tool for this (since 1.8), jQuery.parseHTML. There's still a catch, according to the documentation events inside attributes(i.e. <img onerror>) will still run.

This is what I'm using:

var $dom = $($.parseHTML(d));

$dom will be a jquery object with the elements found

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.