Extract and replace content from multiple HTML tags in Javascript

Question

I'm working on new plugin in javascript that load an HTML page using an Ajax request, extract all the scripts from the page and then execute it once all the content has been loaded. To do this I'm trying something like that :

var scripts = '',
    domResponse = $('<div/>').append(HTMLresponse
        .replace(/<\s*script\s*([^>]*)>((.|\n)*)<\s*\/\s*script>/i, function($0,$1,$2){
            scripts += $2;
            return '';
        }));
// Then I load the content and I execute scripts

When I try with a page containing a single script tag it works fine, but if I try with a page like :

<script>
   // Some javascript
</script>

<!-- SOME HTML -->

<script>
   // Another script
</script>

domResponse is empty and scripts contain the text between the first <script> and the last </script>.

Is there any solution to make it work properly ?

Elliot Bonneville · Accepted Answer · 2012-05-10 16:10:43Z

3

If I understand what you're attempting to do, would this work?

var scriptElements = document.getElementsByTagName("script");
var scripts = "";

for(var i = 0; len = scriptElements.length; i < len; i++) {
    scripts += scriptElements[i].innerHTML;
    scriptElements[i].innerHTML = "";
}

// load content and execute scripts

answered May 10, 2012 at 16:10

Elliot Bonneville

53.6k23 gold badges101 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Parth Thakkar Over a year ago

+1 for removing the regex for this simple task. TiuShu - this is how i would recommend it...but if you want to use regex, then the problem is that you aren't using 'g' - global directive, if you call it so.

TiuSh Over a year ago

Actually I want to extract script tags from a string, not from the document. First I tried with jQuery "$('<div/>').append(HTMLresponse).find('script');" but jQuery executes scripts when calling "append". That's why I used regex. But maybe there's a workaround using pure javascript ?

Andrew Cheong · Accepted Answer · 2012-05-10 20:17:16Z

0

Like others, I'd recommend against using regex for anything HTML-related.

However, ignoring that, I can still answer your question. Your problem is that you are using a greedy quantifier, i.e. (.|\n)*, which "eats" as much as it can, as long as it ends with </script>. What you want is a non-greedy quantifier, like this:

<\s*script\s*([^>]*)>((.|\n)*?)<\s*\/\s*script>

See here: http://rubular.com/r/U2vvOW6XfZ.

Note that the regex will break if any attribute in a script tag contains a >; if the script for some reason includes a </script> within it (perhaps in a comment); if the page, in general, has commented out a script; etc. This is why it's much better to use a parser.

edited May 10, 2012 at 20:17

answered May 10, 2012 at 20:12

Andrew Cheong

30.4k17 gold badges103 silver badges173 bronze badges

2 Comments

Andrew Cheong Over a year ago

@ElliotBonneville - I'm voting your answer up; it's better.

TiuSh Over a year ago

Great ! It works ! Thank you for your help ! But you're right, I'll try to find a way to do this without regex...

Collectives™ on Stack Overflow

Extract and replace content from multiple HTML tags in Javascript

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related