1

I am trying to clean an URL (rss feed) such that after the last .rss (or.html) there are no further characters. I'm using the TryIt Editor on w3schools.com for testing. The following is my test code:

var str="http://rss.cnn.com/rss/cnn_world.rsstest";
var patt1=/(.*[.rss|.html]).*/g;
var result = str.replace(patt1, "$1");
document.write(result);

The problem I am having is that the result shown is

http://rss.cnn.com/rss/cnn_world.rsstest

i.e. the "test" didn't get removed. I am wondering if someone could check my regex and explain what I am doing wrong?

Thank you.

1
  • Lose the [], escape the . to \. (Note this will also kill any query string params ...) Commented Aug 6, 2012 at 15:06

4 Answers 4

2

Firstly, I recommend jsFiddle or some other testing service. Forgive my bias.

Some other answerers seem to have completely missed the point, so to explain your errors:

  1. [] does not group—it defines a character class. What you've written actually matches a single character, namely any of these: .|hlmrst.
  2. Without the $ anchor the two .*s may not match what you'd expect.

Try instead:

/(\.rss|\.html).*$/g

Here's the jsFiddle demo.

Sign up to request clarification or add additional context in comments.

2 Comments

Your answer is great, and since I work with sunny, and we both are kinda working on this issue together, we are looking for the last instance of .rss or .html, giving this example http://rss.cnn.com/rss/cnn_world.rss/cnn_world.rsstest as second test, Your regex is not greedy, so it will look for the first iteration, and the .* will mean the second instance will be gone. Is there a way to flag it to be greedy, opposite of ?
@Churk - Indeed there is. Use a negative lookahead assertion: /(\.rss|\.html)(?!.*\.rss|\.html).*$/g. See jsfiddle.net/2f4jx.
1

should be var patt1=/(\.rss|\.html).*$/g; because . is special character

2 Comments

Please see this: stackoverflow.com/questions/9466768/… The "." is literalized within the []
@Stano my comment was referring to because . is special character and to what the op originally posted var patt1=/(.*[.rss|.html]).*/g;. The dot inside [] is not special characters, they are literalized by the [].
1

try to use substring

string.substring(from, to)

and the lastindexof function

string.lastIndexOf(searchvalue) 

combine it to:

var result = str.substring(0, str.toLowerCase().lastIndexOf("rss") + 3);

finally:

if (str.toLowerCase().lastIndexOf(".rss") > str.toLowerCase().lastIndexOf(".html")) 
{ result = str.substring(0, str.toLowerCase().lastIndexOf(".rss") + 4);
} else {
result = str.substring(0, str.toLowerCase().lastIndexOf(".html") + 5);
}

3 Comments

edited to get rid of lower or upper case and my other errors ;-)
you can check wheter .rss or html is at the end by comparing the Index of both and then decide which string you use in lastIndexOf
There are many ways to skin a cat. I think a regex-based solution is more appropriate here. Interesting, though!
0

Why don't you do

var str="http://rss.cnn.com/rss/cnn_world.rsstest";
str.replace(/test$/, "");

2 Comments

"test" is just my sample test, ideally it should replace anything after .rss
because it's may not "test" it can be anything.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.