Regex to get all javascript tags c#

Question

I'm looking for a regex that will allow me to get all javscript and css link tags in a string so that I can strip certain tags from a DotNetNuke (Yeah I know.... ouch!) page on an overridden render event.

I know about the html agility pack i've even read Jeff Atwoods blog entry but unfortunately I don't have the luxury of a 3rd party library.

Any help would be appreciated.

Edit, I gave this a try to get a javascript entry but it didn't work. Regex's are a dark art to me.

updatedPageSource = Regex.Replace(
pageSource, 
String.Format("<script type=\"text/javascript\" src=\".*?{0}\"></script>",
 name), "", RegexOptions.IgnoreCase);

"unfortunately I don't have the luxury of a 3rd party library." Care to explain why? — moinudin
– moinudin, Commented Feb 11, 2011 at 13:54
@marcog I'm working on a project that has to be finished today. If I introduce a 3rd party solution I have to get it checked etc to see if it's ok. — James South
– James South, Commented Feb 11, 2011 at 14:02

Mitchel Sellers · Accepted Answer · 2011-02-11 15:22:19Z

1

I have a few comments on this, your RegEx is close, the following has been tested to work

<script type="text/javascript" src=".*myfile.js"></script>

I used the following test inputs

<script type="text/javascript" src="myfile.js"></script>
<script type="text/javascript" src="/test/myfile.js"></script>
<script type="text/javascript" src="/test/Looky/myfile.js"></script>

However, I would caution on this approach, and it does take time to parse, can be error prone, etc...

answered Feb 11, 2011 at 15:22

Mitchel Sellers

63.3k15 gold badges115 silver badges174 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MarioVW · Accepted Answer · 2011-02-11 16:13:41Z

1

DISCLAIMER: Regex + HTML = ouch!

Your problem may be that you are not escaping the Regex metacharacters from name (e.g. the dot metacharacter '.'). You may want to try this:

updatedPageSource = Regex.Replace(
    pageSource, 
    String.Format("<script\\s+type=\"text/javascript\"\\s+src=\".*?{0}\"\\s*>\\s*</script>", Regex.Escape(name)),
    "",
    RegexOptions.IgnoreCase);

// Just one of the many reasons why you don't mix Regex with HTML:
updatedPageSource = Regex.Replace(
    updatedPageSource, 
    String.Format("<script\\s+src=\".*?{0}\"\\s+type=\"text/javascript\"\\s*>\\s*</script>", Regex.Escape(name)),
    "",
    RegexOptions.IgnoreCase);

I also added optional whitespace here and there.

edited Feb 11, 2011 at 16:13

answered Feb 11, 2011 at 15:33

MarioVW

2,5923 gold badges26 silver badges32 bronze badges

3 Comments

Justin Morgan Over a year ago

Watch out for the greedy .* in your code. That will match all the way to the last </script> tag it can find. You want .*?.

MarioVW Over a year ago

Thanks, we learn something new every day. For reference: Regex: Greedy vs Lazy

Justin Morgan Over a year ago

Oh...and your \s needs to be doubly escaped. Either that, or use @"...", but then you'll have to escape the " by doubling them. :)

Justin Morgan · Accepted Answer · 2011-02-11 16:09:51Z

0

Don't forget to account for things like whitespace, other attributes, different orders of attributes (i.e. src="foo" type="bar" vs type="bar" src="foo"), and " vs ' quoting. Maybe this?

@"<\s*script\b.*?\bsrc=(""|').*?{0}\1\b.*?(/>|>\s*</\s*script\s*>)"

I went ahead and took out the type attribute. If you have the filename, you know what type of script it is anyway; plus, this accounts for tags where the src tag comes first, or they used the deprecated language tag, or they omitted type altogether (it's supposed to be there, but it isn't always). Note that I'm using the lazy .*? so that it doesn't match all the way to the last </script> in the page.

answered Feb 11, 2011 at 16:09

Justin Morgan

30.7k13 gold badges82 silver badges109 bronze badges

Collectives™ on Stack Overflow

Regex to get all javascript tags c#

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related