0

I'm having a bit of a problem with converting plain text to an url. What I like to have is, if I have text like this: www.google.com, it's converted to

<a href="www.google.com" target="_blank">www.google.com</a>

I'm kind of a RegEx noob, but I tried this:

$description = preg_replace('@(www.([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $description);

The description var is a piece of text, which CAN contain unconverted url's.

With the code above, I get this as link:

<a target="_blank">www.google.com</a>

So the href part is left out. This must be a piece of cake for you RegEx wizards out there, so thanks in advance for every help.

If there is another (better?) way to convert plain text to url's, you can say so and I'll try it.

7
  • I've tried running your code and it does work perfectly. Which php version are you using? Commented Feb 23, 2012 at 10:01
  • 1
    Can you post an example value for $description? Commented Feb 23, 2012 at 10:13
  • Here you go: En je bent overal welkom als je maar breeddenkend bent!" Tempo (www.temponieuwsbrief.be) mocht op kotbezoek! Commented Feb 23, 2012 at 10:14
  • Either you found a bug in PHP or you're not debugging correctly. That text does work in PHP 5.3.3, 5.3.6 and 5.3.10. Run the contents of pastebin.com/YqqQRSnV on its file and let me know if that works. Commented Feb 23, 2012 at 10:18
  • 1
    i'm not a PHP guy but I fail to see how this could be regex issue. Your replacement string is static and has href in it, so how could regex remove it? must be downstream. Commented Feb 23, 2012 at 10:20

4 Answers 4

2

If your only problem is that the link incorrectly points towards www.google.com instead of the fully qualified URL, such as http://www.google.com, then the correct replacement would be:

$description = preg_replace('@(www.([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="http://$1" target="_blank">$1</a>', $description);
Sign up to request clarification or add additional context in comments.

Comments

1

<a href="www.example.com">www.example.com</a> will not work correctly in modern browsers because the href value will be just appended to the current page url, e.g. http://example.com/www.example.com. You need to specify the protocol, ie. http/https, etc.

The following will replace all text "links" starting with ftp, http, https and file with html a tags

<?php

    $pattern = '/(www|ftp|http|https|file)(:\/\/)?[\S]+(\b|$)/i';
    $string = 'hello http://example.com https://graph.facebook.com    http://www.example.com www.google.com';

    function create_a_tags( $matches ){

        $url = $matches[0];
        if ( 'www' == $matches[1] ){
            $url = 'http://' . $matches[0];
        }
        $escaped = htmlspecialchars($matches[0]);
        return sprintf( '<a href="%s">%s</a>', $url, $escaped );
    }

    echo preg_replace_callback( $pattern, 'create_a_tags', $string );

?>

prints

hello <a href="http://example.com">http://example.com</a>
<a href="https://graph.facebook.com">https://graph.facebook.com</a>
<a href="http://www.example.com">http://www.example.com</a>
<a href="http://www.google.com">www.google.com</a>

2 Comments

But what if the text is like this: www.google.com, and I want to get it like this: <a href="google.com"></a>?
I've edited the code above to handle www urls as well (by adding http:// to the href attribute) but it may now create some false positives (I haven't tested it)
0

Quite a while ago we compared different approaches to URL verification and identification. See the table of regular expressions.

I suggest you drop your regex and use the gruber revised instead. A (PHP 5.3) solution could look like:

<?php

$string = 'hello 
http://example.com 
https://graph.facebook.com 
http://www.example.com
www.google.com
ftp://example.com';

$string = preg_replace_callback('#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))#iS', function($m) {
    // use http as default protocol, if none given
    if (strpos($m[0], '://') === false) {
        $m[0] = 'http://' . $m[0];
    }
    // text -> html is a context switch, take care of special characters
    $_m = htmlspecialchars($m[0]);
    return '<a href="' . $_m . '" target="_blank">' . $_m . '</a>';
}, $string);

echo $string, "\n";

4 Comments

There isn't anything fundamentally wrong with the regex he's currently using - the generated markup doesn't look to be valid (not scheme on the href)
I never said there was anything wrong with his regex. I just explained there's a better one. Also, this solution is the only one sanitizing the URL for use in HTML. Something I do think is important to mention. If you're interested only in answering the core question without looking at the bigger picture - be my guest and downvote all you want…
It's not compiling well, I get this error: Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING (on $string = preg_replace_callback('#(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z...)
rodneyrehm - there is "bigger picture" and then there's misdirection.
0

I've found the solution. It indeed didn't have anything to do with the RegEx, that was correct. My coworker added this line of jquery code in the head:

$("a").removeAttr('href');

So obviously the href attribute was being removed. I didn't look at this because I was sure this was a php/regex problem. Removing this fixed the problem.

I realize this was a stupid error and it was impossible for you to solve this, so thanks all for helping, +1 to you guys.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.