19

I need to validate the incoming string for text <script.

Example:
string a = "This is a simple <script> string";

Now, I need to write a regular expression that will tell me whether this string contains a <script> tag or not.

I ended up writing something like: <* ?script.* ?>

But the challenge is, Incoming string may contain script in following ways,

string a = "This is a simple <script> string";
string a = "This is a simple < script> string";
string a = "This is a simple <javascript></javascript> string";
string a = "This is a simple <script type=text/javascript> string";

Hence the regular expression should check for starting < tag and then it should check for script.

1

7 Answers 7

54
Use:
/<script[\s\S]*?>[\s\S]*?<\/script>/gi

@bodhizero’s accepted answer of <[^>]*script incorrectly returns true under the following conditions:

// Not a proper script tag.
const a = "This is a simple < script> string"; 

// Space added before "img", otherwise the entire tag fails to render here.
const a = "This is a simple < img src='//example.com/script.jpg'> string";

// Picks up "nonsense code" just because a '<' character happens to precede a 'script' string somewhere along the way.
const a = "This is a simple for(i=0;i<5;i++){alert('script')} string";

Here is an excellent resource for building and testing regular expressions.

Sign up to request clarification or add additional context in comments.

2 Comments

And what if html encoding is used. So the string must be validated against this regex too: %3Cscript[\s\S]*?%3E[\s\S]*?%3C\/script%3E/gi One more thing "i" is to be used for case insensitivity.
This yields a "catastrophic backtracking" error when tried in regex101.com on my HTML
5

Try this:

/(<|%3C)script[\s\S]*?(>|%3E)[\s\S]*?(<|%3C)(\/|%2F)script[\s\S]*?(>|%3E)/gi

1 Comment

How is this answer better than the one by Let Me Tink About It?
4

Use this:

const re = /<script\b[^>]*>[\s\S]*?<\/script\b[^>]*>/g

Use it like this:

const html = `
  ...
  
    <script type="text/javascript">
        alert('1');
    </script>

    <div>Test</div>

    <script type="text/javascript">
        alert('2');
    </script>

  ...
`

const re = /<script\b[^>]*>[\s\S]*?<\/script\b[^>]*>/g

const results = html.match(re)

console.log(results) // an array containing each script tag.

See that specific regex in action and learn about it here:

https://regexr.com/5od96

The Regexr site is the most useful regex site! Hover on any part of the regex and it'll tell you about it, plus so much more. Also save and explore regexes other people made.

1 Comment

thanks a lot. this saves me a lot of time. the best and the only working solution for all my use-cases
3

The regex based solution I would recommend is the following:

Regex rMatch = new Regex(@"<script[^>]*>(.*?)</script[^>]*>", RegexOptions.IgnoreCase & RegexOptions.Singleline);
myString = rMatch.Replace(myString, "");

This regex will correctly identify and remove script tags in the following strings:

<script></script>
<script>something...</script>
something...<ScRiPt>something...</scripT>something...
something...<ScRiPt something...="something...">something...</scripT something...>something...

Bonus, it will not match on any of the following invalid script strings:

< script></script>
<javascript>something...</javascript>

5 Comments

Hey Jason, how would you use negative lookahead with this regx? ie: Not this.
Whoops! You and your upvoters just got... <scr<script></script>ipt>alert("p0w3nd!")</script>
#Zectbumo In my opinion, your string should be validated as true. Only valid script tags will be parsed as javascript by the browser. Badly formatted strings will be treated like text. So, your string is not dangerous in any way.
Sorry you forgot line breaks, this regex won't register if input has line breaks: <script>tra la\n la</script>
That is not JavaScript code and the original poster asked for JavaScript in ALL CAPS.
2

A negated character class comes in handy here.

<[^>]*script

2 Comments

Thanks bodhizero. I also found something similar, (%3C*|<)[^*]?script
str.includes('<script')
0

Following Java code will take care of most of issues with spaces case insensitiveness.

Java Code:

public static void main(String[] Str) {
     String testStr = "Test<scrIpt test= \"jjj/sdf \" >1</ script  > < script >2</ script > < ScrIpt > alert(1) <1g  >2gb </ script > <br> <Script > </ SCRIPT> replace";

    sanitize_html_script_tags(testStr);
}

public static String sanitize_html_script_tags(String inStr) {
    if(!inStr.isEmpty())) {
        if(inStr.contains("<") && inStr.contains(">")) {
            Pattern REMOVE_SCRIPT_TAGS = Pattern.compile("</?\\s*(?)script\\s*[a-zA-Z=/\"]*\\s*>", Pattern.CASE_INSENSITIVE);
            Matcher m = REMOVE_SCRIPT_TAGS.matcher(inStr);
            inStr = m.replaceAll("");
            return inStr;
        }
    }
    return inStr;
}

1 Comment

The answer you have provided is unrelated to the user's question. The users requires a solution using "Javascript" not "Java"
-2

I think this one definitely works for me.

var regexp = /<script+.*>+.*<\/script>/g;

1 Comment

It does not work if there are line breaks between script tags! Additionally, the plus in opening script does not make sense, you're saying "one or more letter t" there. You probably meant to put brackets but it's a flawed regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.