9

I have a function that will add the <a href> tag before a link and </a> after the link. However, it breaks for some webpages. How would you improve this function? Thanks!

function processString($s) 
{
    // check if there is a link

    if(preg_match("/http:\/\//",$s))
    {
        print preg_match("/http:\/\//",$s);


        $startUrl =  stripos($s,"http://");

        // if the link is in between text
        if(stripos($s," ",$startUrl)){
            $endUrl = stripos($s," ",$startUrl);
        }
        // if link is at the end of string
        else {$endUrl = strlen($s);}

        $beforeUrl = substr($s,0,$startUrl);
        $url = substr($s,$startUrl,$endUrl-$startUrl);
        $afterUrl = substr($s,$endUrl);

        $newString = $beforeUrl."<a href=\"$url\">".$url."</a>".$afterUrl;

        return $newString;
    }

    return $s;
}
5
  • The regex is a little sloppy, but 99% of my input will have correct URLs if any Commented Nov 18, 2010 at 16:53
  • 4
    What webpages does it break for? Commented Nov 18, 2010 at 16:54
  • At the beginning you test agains https also, but later you omit the "s". Dont know, if this cause this error, because I also dont know, which pages are broken ;) Commented Nov 18, 2010 at 16:58
  • Sorry, I removed the [s] from the regex. How could I include functionality for strings such as "www.google.com", or "https:www.example.com" ? Commented Nov 18, 2010 at 17:00
  • "www.google.com" is going to be harder to parse. you need a long regex just to accommodate all TLDs. Commented Nov 18, 2010 at 17:03

3 Answers 3

21
function processString($s) {
    return preg_replace('/https?:\/\/[\w\-\.!~#?&=+\*\'"(),\/]+/','<a href="$0">$0</a>',$s);
}
Sign up to request clarification or add additional context in comments.

5 Comments

I think a "=" is missing: it fails when the url contains get parameters. I just added it after the "&" and now it works: preg_replace('/https?:\/\/[\w\-\.!~?&=+\*\'"(),\/]+/','<a href="$0">$0</a>',$s)
You forgot about addresses with # inside - so more correct version is preg_replace('/https?:\/\/[\w\-\.!~#?&=+\*\'"(),\/]+/','<a href="$0">$0</a>',$text)
I just edited the answer to reflect these two additions.
Do not forget to add the 'u' modifier if your string may contains utf8 characters
I add one more char "%". As result: '/https?:\/\/[\w\-%\.!~#?&=+*\'"(),\/]+/','<a href="$0">$0</a>'
1

It breaks for all URLs that contain "special" HTML characters. To be safe, pass the three string components through htmlspecialchars() before concatenating them together (unless you want to allow HTML outside the URL).

Comments

1
function processString($s){
  return preg_replace('@((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@', '<a href="$1">$1</a>', $s);
}

Found it here

1 Comment

The same scenario I need it in Jquery/Javascript. Can anyone help ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.