Using Javascript and regex to modify the end of an URL

Question

I am trying to clean an URL (rss feed) such that after the last .rss (or.html) there are no further characters. I'm using the TryIt Editor on w3schools.com for testing. The following is my test code:

var str="http://rss.cnn.com/rss/cnn_world.rsstest";
var patt1=/(.*[.rss|.html]).*/g;
var result = str.replace(patt1, "$1");
document.write(result);

The problem I am having is that the result shown is

http://rss.cnn.com/rss/cnn_world.rsstest

i.e. the "test" didn't get removed. I am wondering if someone could check my regex and explain what I am doing wrong?

Thank you.

Lose the [], escape the . to \. (Note this will also kill any query string params ...) — Alex K.
– Alex K., Commented Aug 6, 2012 at 15:06

Andrew Cheong · Accepted Answer · 2012-08-06 15:34:04Z

2

Firstly, I recommend jsFiddle or some other testing service. Forgive my bias.

Some other answerers seem to have completely missed the point, so to explain your errors:

[] does not group—it defines a character class. What you've written actually matches a single character, namely any of these: .|hlmrst.
Without the $ anchor the two .*s may not match what you'd expect.

Try instead:

/(\.rss|\.html).*$/g

Here's the jsFiddle demo.

edited Aug 6, 2012 at 15:34

answered Aug 6, 2012 at 15:07

Andrew Cheong

30.4k17 gold badges103 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Churk Over a year ago

Your answer is great, and since I work with sunny, and we both are kinda working on this issue together, we are looking for the last instance of .rss or .html, giving this example http://rss.cnn.com/rss/cnn_world.rss/cnn_world.rsstest as second test, Your regex is not greedy, so it will look for the first iteration, and the .* will mean the second instance will be gone. Is there a way to flag it to be greedy, opposite of ?

Andrew Cheong Over a year ago

@Churk - Indeed there is. Use a negative lookahead assertion: /(\.rss|\.html)(?!.*\.rss|\.html).*$/g. See jsfiddle.net/2f4jx.

Nhu Trinh · Accepted Answer · 2012-08-06 15:10:53Z

1

should be var patt1=/(\.rss|\.html).*$/g; because . is special character

edited Aug 6, 2012 at 15:10

answered Aug 6, 2012 at 15:05

Nhu Trinh

14k6 gold badges65 silver badges88 bronze badges

2 Comments

Churk Over a year ago

Please see this: stackoverflow.com/questions/9466768/… The "." is literalized within the []

Churk Over a year ago

@Stano my comment was referring to because . is special character and to what the op originally posted var patt1=/(.*[.rss|.html]).*/g;. The dot inside [] is not special characters, they are literalized by the [].

Nikolaj Zander · Accepted Answer · 2012-08-06 15:30:47Z

1

try to use substring

string.substring(from, to)

and the lastindexof function

string.lastIndexOf(searchvalue)

combine it to:

var result = str.substring(0, str.toLowerCase().lastIndexOf("rss") + 3);

finally:

if (str.toLowerCase().lastIndexOf(".rss") > str.toLowerCase().lastIndexOf(".html")) 
{ result = str.substring(0, str.toLowerCase().lastIndexOf(".rss") + 4);
} else {
result = str.substring(0, str.toLowerCase().lastIndexOf(".html") + 5);
}

edited Aug 6, 2012 at 15:30

answered Aug 6, 2012 at 15:09

Nikolaj Zander

1,2709 silver badges13 bronze badges

3 Comments

Nikolaj Zander Over a year ago

edited to get rid of lower or upper case and my other errors ;-)

Nikolaj Zander Over a year ago

you can check wheter .rss or html is at the end by comparing the Index of both and then decide which string you use in lastIndexOf

Brian Ustas Over a year ago

There are many ways to skin a cat. I think a regex-based solution is more appropriate here. Interesting, though!

Hans Hohenfeld · Accepted Answer · 2012-08-06 15:06:16Z

0

Why don't you do

var str="http://rss.cnn.com/rss/cnn_world.rsstest";
str.replace(/test$/, "");

answered Aug 6, 2012 at 15:06

Hans Hohenfeld

1,74911 silver badges14 bronze badges

2 Comments

SunN Over a year ago

"test" is just my sample test, ideally it should replace anything after .rss

Nhu Trinh Over a year ago

because it's may not "test" it can be anything.

Collectives™ on Stack Overflow

Using Javascript and regex to modify the end of an URL

4 Answers 4

2 Comments

2 Comments

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related