5

Is it possible to pass HTML to a browser through JavaScript and parse it with jQuery, but not load external resources? (scripts, images, flash, anything)

I will do with the XML parser if that is the best I can do, but I would like to allow loose HTML if possible.

It must be compatible with Chrome, Firefox, the latest IE.

4
  • I solved the issue.. you can adapt this into replacing the src for any tags stackoverflow.com/questions/6671461/… Commented Jul 12, 2011 at 22:48
  • Since I do not control the source HTML, and there are too many tricky hacks, I cannot accept a regex answer. Sorry. Additionally, it makes sense that scripts should not be executed since they may decide to load external resources on their own. Commented Jul 13, 2011 at 0:43
  • 1
    2021 - ten years later I'm acing the same problem. Using document.createElement will load resources like images in the background.. I use a temporary DOMParser to avoid this (developer.mozilla.org/en-US/docs/Web/API/DOMParser). Commented Jul 12, 2021 at 6:09
  • This is a great alternative! Answer? Commented Jul 19, 2021 at 22:21

1 Answer 1

1
var html = someHTML; //passed in html, maybe $('textarea#id').val();? I don't understand what you mean by 'passed in html'
var container = document.createElement('div');
container.innerHTML = html;
$(container).find('img,embed,head,script,style').remove();
//or
$(container).find('[src]').remove();

var target = someTarget; //place to put parsed html
$(container).appendTo($(target));

EDIT

Tested working

removeExt = function(cleanMe) {
    var toScrutinize = $(cleanMe).find('*'); //get ALL elements
    $.each(toScrutinize, function() {
      var attr = $(this)[0].attributes; //get all the attributes
      var that = $(this); 
      $.each(attr, function(){
          if ($(that).attr(this.nodeName).match(/^http/)) {//if the attribute value links externally
           $(that).remove(); //...take it out  
          } 
      })
    })
    $('script').remove(); //also take out any inline scripts
}

var html = someHTML;
var container = document.createElement('div');
container.innerHTML = html;
removeExt($(container));
var target = someTarget;
$(container).appendTo($(target));

This will match src, href, link, data-foo, whatever... No way to link externally. http and https are both matched. inline scripts are killed. If it's still a security concern, then maybe this should be done server side, or obfuscate your JS.

Sign up to request clarification or add additional context in comments.

6 Comments

Interesting solution, but not particularly good for security. Sorry, it looks like the answer is "it can't be done without a blacklist".
By external, do you mean always hosted on a domain other than the domain of that of the script? Or do you mean any element that could not be depicted in a single HTML file?
I don't want any network activity to be performed when the parse occurs, and I don't want any scripts to run.
Edited. I hope this matches what you're looking for
Looks like the best bet, though I would use a whitelist instead of a blacklist.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.