How to find "<script>" tag from the string with JAVASCRIPT/ regular expression

Question

I need to validate the incoming string for text <script.

Example:

string a = "This is a simple <script> string";

Now, I need to write a regular expression that will tell me whether this string contains a <script> tag or not.

I ended up writing something like: <* ?script.* ?>

But the challenge is, Incoming string may contain script in following ways,

string a = "This is a simple <script> string";
string a = "This is a simple < script> string";
string a = "This is a simple <javascript></javascript> string";
string a = "This is a simple <script type=text/javascript> string";

Hence the regular expression should check for starting < tag and then it should check for script.

Please read this owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet It may be highly relevant here. — Prinzhorn
– Prinzhorn, Commented May 16, 2013 at 11:13

Let Me Tink About It · Accepted Answer · 2020-04-03 06:29:24Z

54

Use:

/<script[\s\S]*?>[\s\S]*?<\/script>/gi

@bodhizero’s accepted answer of <[^>]*script incorrectly returns true under the following conditions:

// Not a proper script tag.
const a = "This is a simple < script> string"; 

// Space added before "img", otherwise the entire tag fails to render here.
const a = "This is a simple < img src='//example.com/script.jpg'> string";

// Picks up "nonsense code" just because a '<' character happens to precede a 'script' string somewhere along the way.
const a = "This is a simple for(i=0;i<5;i++){alert('script')} string";

Here is an excellent resource for building and testing regular expressions.

edited Apr 3, 2020 at 6:29

answered Oct 4, 2013 at 18:53

Let Me Tink About It

16.3k23 gold badges109 silver badges221 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pankaj Goyal Over a year ago

And what if html encoding is used. So the string must be validated against this regex too: %3Cscript[\s\S]*?%3E[\s\S]*?%3C\/script%3E/gi One more thing "i" is to be used for case insensitivity.

Randall Coding Over a year ago

This yields a "catastrophic backtracking" error when tried in regex101.com on my HTML

Let Me Tink About It · Accepted Answer · 2017-01-27 03:30:18Z

5

Try this:

/(<|%3C)script[\s\S]*?(>|%3E)[\s\S]*?(<|%3C)(\/|%2F)script[\s\S]*?(>|%3E)/gi

edited Jan 27, 2017 at 3:30

Let Me Tink About It

16.3k23 gold badges109 silver badges221 bronze badges

answered Dec 16, 2015 at 11:05

Pankaj Goyal

1,5583 gold badges15 silver badges26 bronze badges

1 Comment

trusktr Over a year ago

How is this answer better than the one by Let Me Tink About It?

trusktr · Accepted Answer · 2021-03-12 04:17:26Z

4

Use this:

const re = /<script\b[^>]*>[\s\S]*?<\/script\b[^>]*>/g

Use it like this:

const html = `
  ...
  
    <script type="text/javascript">
        alert('1');
    </script>

    <div>Test</div>

    <script type="text/javascript">
        alert('2');
    </script>

  ...
`

const re = /<script\b[^>]*>[\s\S]*?<\/script\b[^>]*>/g

const results = html.match(re)

console.log(results) // an array containing each script tag.

See that specific regex in action and learn about it here:

https://regexr.com/5od96

The Regexr site is the most useful regex site! Hover on any part of the regex and it'll tell you about it, plus so much more. Also save and explore regexes other people made.

answered Mar 12, 2021 at 4:17

trusktr

46k58 gold badges215 silver badges289 bronze badges

1 Comment

Eduard Void Over a year ago

thanks a lot. this saves me a lot of time. the best and the only working solution for all my use-cases

Jason Williams · Accepted Answer · 2018-06-07 17:44:41Z

3

The regex based solution I would recommend is the following:

Regex rMatch = new Regex(@"<script[^>]*>(.*?)</script[^>]*>", RegexOptions.IgnoreCase & RegexOptions.Singleline);
myString = rMatch.Replace(myString, "");

This regex will correctly identify and remove script tags in the following strings:

<script></script>
<script>something...</script>
something...<ScRiPt>something...</scripT>something...
something...<ScRiPt something...="something...">something...</scripT something...>something...

Bonus, it will not match on any of the following invalid script strings:

< script></script>
<javascript>something...</javascript>

edited Jun 7, 2018 at 17:44

answered Oct 9, 2014 at 21:16

Jason Williams

2,85830 silver badges36 bronze badges

5 Comments

yardpenalty.com Over a year ago

Hey Jason, how would you use negative lookahead with this regx? ie: Not this.

Zectbumo Over a year ago

Whoops! You and your upvoters just got... <scr<script></script>ipt>alert("p0w3nd!")</script>

Jason Williams Over a year ago

#Zectbumo In my opinion, your string should be validated as true. Only valid script tags will be parsed as javascript by the browser. Badly formatted strings will be treated like text. So, your string is not dangerous in any way.

revelt Over a year ago

Sorry you forgot line breaks, this regex won't register if input has line breaks: <script>tra la\n la</script>

trusktr Over a year ago

That is not JavaScript code and the original poster asked for JavaScript in ALL CAPS.

bodhizero · Accepted Answer · 2013-05-17 20:49:39Z

2

A negated character class comes in handy here.

<[^>]*script

answered May 17, 2013 at 20:49

bodhizero

5893 silver badges12 bronze badges

2 Comments

Ajay Kulkarni Over a year ago

Thanks bodhizero. I also found something similar, (%3C*|<)[^*]?script

Muhammad Umer Over a year ago

str.includes('<script')

AztecCodes · Accepted Answer · 2024-01-26 20:28:02Z

0

Following Java code will take care of most of issues with spaces case insensitiveness.

Java Code:

public static void main(String[] Str) {
     String testStr = "Test<scrIpt test= \"jjj/sdf \" >1</ script  > < script >2</ script > < ScrIpt > alert(1) <1g  >2gb </ script > <br> <Script > </ SCRIPT> replace";

    sanitize_html_script_tags(testStr);
}

public static String sanitize_html_script_tags(String inStr) {
    if(!inStr.isEmpty())) {
        if(inStr.contains("<") && inStr.contains(">")) {
            Pattern REMOVE_SCRIPT_TAGS = Pattern.compile("</?\\s*(?)script\\s*[a-zA-Z=/\"]*\\s*>", Pattern.CASE_INSENSITIVE);
            Matcher m = REMOVE_SCRIPT_TAGS.matcher(inStr);
            inStr = m.replaceAll("");
            return inStr;
        }
    }
    return inStr;
}

edited Jan 26, 2024 at 20:28

AztecCodes

7886 gold badges17 silver badges31 bronze badges

answered Jan 20, 2024 at 19:53

M Patil

1

1 Comment

hazelcodes Over a year ago

The answer you have provided is unrelated to the user's question. The users requires a solution using "Javascript" not "Java"

Theodore Xu · Accepted Answer · 2018-12-10 08:02:01Z

-2

I think this one definitely works for me.

var regexp = /<script+.*>+.*<\/script>/g;

answered Dec 10, 2018 at 8:02

Theodore Xu

12 bronze badges

1 Comment

revelt Over a year ago

It does not work if there are line breaks between script tags! Additionally, the plus in opening script does not make sense, you're saying "one or more letter t" there. You probably meant to put brackets but it's a flawed regex.

Collectives™ on Stack Overflow

How to find "<script>" tag from the string with JAVASCRIPT/ regular expression

7 Answers 7

2 Comments

1 Comment

1 Comment

5 Comments

2 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

1 Comment

1 Comment

5 Comments

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related