0

In my code I have a parent DOM element docElem. This is an iframe containing a complete HTML document. Now I want to remove all inline JavaScript. How to do that in JQuery. Do we have any selector that can pull out all elements that have an attribute that matches on.* regex?

Please Note that I am asking about all inline script not onclick alone....

EDIT To eliminate any confusion here is a example code:

var docHtml = '<html><head></head><body ><img src="smiley.gif" ><img src="smiley2.gif" onfocus="methodCall()"><div  onclick="methodCall()" id="uvTab" ></div></body></html>';
var docElem = $($.parseHTML('<iframe></iframe>'))
     .append($.parseHTML(docHtml, true));
var tagList = //some thing here that can bring me the img tag and the div tag.
11
  • 6
    what you have tried ? Commented May 30, 2014 at 10:45
  • no simple regex approach like you hoped for. Commented May 30, 2014 at 10:48
  • @Tariq do you want to change in Iframe ? If yes then it is not possible. Commented May 30, 2014 at 10:53
  • your parent DOM node is an iframe? do you mean that you entire "code" is in html document that is loaded in an iframe of another page? or is it the other way around and your document is embedding another document in an iframe - and you want to remove those attributes from the code loaded in an iframe? Commented May 30, 2014 at 10:55
  • 1
    Added and example code. Commented May 30, 2014 at 11:29

1 Answer 1

1

You can use the html-sanitizer from the Google Caja project. It can be used stand-alone in the browser.

You can get it from:

http://caja.appspot.com/html-css-sanitizer-minified.js

or:

http://caja.appspot.com/html-sanitizer-minified.js

(depending on whether or not you need to sanitize css as well)

You have to define two functions to tell the sanitizer how you want it to treat URLs and elements IDs (I'll name them sanUrl() and sanId() here).

For example you may want to completely remove IDs so that they don't interfere with your own IDs:

function sanId(id) {
  return undefined;
}

or you may want to add some prefix:

function sanId(id) {
  return "PREFIX" + id;
}

or just use them unchanged if it's ok for you:

function sanId(id) {
  return id;
}

The same with URLs:

function sanUrl(url) {
  // sanitize urls if needed
  // eg. add a prefix or remove relative/absolute urls etc.
  return url;
}

Now you can use the html_sanitize() function like this:

var sanitizedHtml = html_sanitize(originalHtml, sanUrl, sanId);

It will strip much more than what you described which means that you won't get into trouble if you have some input that you haven't anticipated.

It will also strip the html, head and body tags so if you need them you can add:

fullHtml = "<html><head></head><body>" + sanitizedHtml + "</body></html>";

You can also eg. get the image URLs using a code like this:

$(sanitizedHtml).find('img').addBack().filter('img')
  .each(function (i, el) {
    var url = $(el).attr('src');
    // do something with the URL:
    alert(url);
  });

See this demo:

http://codepen.io/rsp/pen/hLmcE

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I was looking for something like this. No idea why this answer had a down vote on it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.