0

I need to use regex to search a string and fix certain urls. I need to remove the beginning / from links like this one:

/admin.somedomain.com or /somedomain.com

There are a lot of other absolute urls so I can't just strip out any leading / characters. Any help would be greatly appreciated.

This is dealing with user entered text usually html from TinyMCE but sometimes from plain text boxes with and without other HTML (or I would handle it differently and deal with the links directly instead of having to search a string for them first). Unfortunately sometimes the urls are entered incorrectly for a href or img src etc.

I do want links like "/image.jpg" "/webapp/getfile?id=3354"

but not links like "/somedomain.com" "/admin.somedomain.com"

Here is an example of text I might need to clean up

<p><a href="/webapp/GetFile?id={2C59BC2D}"><img src="/wahelper/GetImage?id=308" alt="" width="100" height="100" /></a></p> <p><a href="/admin.somedomain.com">test</a></p>
6
  • Unless there is another way to do it. I am open to suggestions. Commented Aug 31, 2012 at 17:33
  • Regex may in part be your solution but overall I don't think regex is going to be your end all be all. Unless you can come up with something seriously intricate. I would likely take the approach where I would do a comparison of all the urls I have stored. Maybe splitting them at : and taking the first block and seeing if theres more than one / where if there is, leave it alone, then where there isn't remove the first char if that char is a / Commented Aug 31, 2012 at 17:34
  • if there is no : then I would see what the beginning chars are see if it has a // in it or a / Commented Aug 31, 2012 at 17:35
  • So how are you getting these links? From a database? Can we see that code? Commented Aug 31, 2012 at 17:45
  • Please edit your original question with enhancements, instead of adding details in comments -- it's easier to understand your questions that way. I'd like to see a list of URLs you are dealing with, so I can see what a regexp should match and what it shouldn't. Thanks. Commented Aug 31, 2012 at 22:43

1 Answer 1

0

Jeez, such a hard time to get a simple regex. Try this:

$str = preg_replace( "/^\/((?:admin\.)?[^.]+.(?:com|net|other_TLD_you_want))/i", "http://$1", $str);

Note that I've actually replace / with http:// because that's really what you want if you want the link to work. If you just strip off the / then the link will end up being a link to a local file in the current directory called admin.somedomain.com - which is probably not what you want.

Also note that you might want more TLDs than just com and net - add them as you want.

Also note that this won't work for other country TLDs like co.uk

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.