Extract URL from string

Question

I'm trying to find a reliable solution to extract a url from a string of characters. I have a site where users answer questions and in the source box, where they enter their source of information, I allow them to enter a url. I want to extract that url and make it a hyperlink. Similar to how Yahoo Answers does it.

Does anyone know a reliable solution that can do this?

All the solutions I have found work for some URL's but not for others.

Thanks

user113292 · Accepted Answer · 2010-12-08 18:09:07Z

23

John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace() as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

If you only wanted to match HTTP/HTTPS:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

answered Dec 8, 2010 at 18:09

user113292

Sign up to request clarification or add additional context in comments.

4 Comments

Highly Irregular Over a year ago

For anyone who wants all the sub-patterns converted to be non capturing, and the forward slashes escaped: \b(?:(?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|((?:[^\s()<>]+|(?:([^\s()<>]+)))*))+(?:((?:[^\s()<>]+|(?:([^\s()<>]+)))*)|[^\s`!()[]{};:'".,<>?«»“”‘’]))

Toto Over a year ago

TLDs may have much more than 4 characters, see: iana.org/domains/root/db

Linesofcode Over a year ago

And how do we use this regex within preg? I mean, because it has " and ' the code doesn't work properly, like: preg_match('(?i)\b......]))', $str) - all code seems like it is commented.

Aakash Sahai Over a year ago

Not working. Preg_match & preg_match_all failing everytime, even after removing single/double quotes

Jonah · Accepted Answer · 2010-12-08 18:14:48Z

3

$string = preg_replace('/https?:\/\/[^\s"<>]+/', '<a href="$0" target="_blank">$0</a>', $string);

It only matches http/https, but that's really the only protocol you want to turn into a link. If you want others, you can change it like this:

$string = preg_replace('/(https?|ssh|ftp):\/\/[^\s"]+/', '<a href="$0" target="_blank">$0</a>', $string);

edited Dec 8, 2010 at 18:14

answered Dec 8, 2010 at 17:57

Jonah

10.1k5 gold badges49 silver badges80 bronze badges

3 Comments

Gumbo Over a year ago

You might also want to exclude < or apply htmlspecialchars on the matched string to avoid code injection.

Jonah Over a year ago

Good, but if you look at the expression, it allows anything but white-space and ". I believe that eliminates any HTML injection.

Gumbo Over a year ago

Bron: No, you are using the matched value not just as attribute value but also as the elements text content.

vstelmakh · Accepted Answer · 2020-01-25 19:10:16Z

2

There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.

I created a PHP library that could deal with lots of edge cases: Url highlight.

You could extract urls from string or directly highlight them.
Example:

<?php

use VStelmakh\UrlHighlight\UrlHighlight;

$urlHighlight = new UrlHighlight();

// Extract urls
$urlHighlight->getUrls("This is example http://example.com.");
// return: ['http://example.com']

// Make urls as hyperlinks
$urlHighlight->highlightUrls('Hello, http://example.com.');
// return: 'Hello, <a href="http://example.com">http://example.com</a>.'

For more details see readme. For covered url cases see test.

edited Jan 25, 2020 at 19:10

answered Jan 25, 2020 at 18:58

vstelmakh

7921 gold badge13 silver badges21 bronze badges

Comments

wallyk · Accepted Answer · 2010-12-08 17:56:31Z

0

Yahoo! Answers does a fairly good job of link identification when the link is written properly and separate from other text, but it isn't very good at separating trailing punctuation. For example The links are http://example.com/somepage.php, http://example.com/somepage2.php, and http://example.com/somepage3.php. will include commas on the first two and a period on the third.

But if that is acceptable, then patterns like this should do it:

\<http:[^ ]+\>

It looks like stackoverflow's parser is better. Is is open source?

answered Dec 8, 2010 at 17:56

wallyk

58.1k17 gold badges92 silver badges155 bronze badges

1 Comment

DampeS8N Over a year ago

smarter, but still not perfect. misses things like ssh+svn.

Paras Dalsaniya · Accepted Answer · 2015-09-30 13:27:02Z

-1

This code is worked for me.

function makeLink($string){

/*** make sure there is an http:// on all URLs ***/
$string = preg_replace("/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i", "$1http://$2",$string);
/*** make all URLs links ***/
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</a>",$string);
/*** make all emails hot links ***/
$string = preg_replace("/([\w-?&;#~=\.\/]+\@(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","<a href=\"mailto:$1\">$1</a>",$string);

return $string;
}

answered Sep 30, 2015 at 13:27

Paras Dalsaniya

1091 gold badge2 silver badges10 bronze badges

1 Comment

Toto Over a year ago

Why are you limiting tld to 3 characters? Have a look at: iana.org/domains/root/db

Collectives™ on Stack Overflow

Extract URL from string

5 Answers 5

4 Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related