2

I'm looking for the best regex to detect URLs in text. After trying many, I came across this article where the author demonstrated his regex to be the most robust among many. I'm trying to get this regex to work in Ruby and Javascript, but both Rubular and Regexpal are giving me errors. When I've tried to fix them, I've gotten no matches. Much love to anyone can help me translate this regex into Ruby and Javascript compatable versions.

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS

3 Answers 3

1

Have you seen the source? There are Ruby and JS ports embedded: gist.github.com/dperini/729294.

Sign up to request clarification or add additional context in comments.

Comments

1

Ruby:

result = subject.scan(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/)

Javascript:

result = subject.match(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/g);

The “perfect URL validation regex” to work in ruby and javascript, is probably:

http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

Comments

0

DMKE answered my original question best, by linking me to some source I'd overlooked, so I accepted his answer. But after testing @diegoperini's regex, I was a bit underwhelmed. I ultimately stumbled upon the following regex I found on Daring Fireball:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'".,<>?«»“”‘’]))

It is liberal, and accepts port numbers, links without http: or www., but still managed to pass my tests. Plus, it is simple and easy to read. So I would recommend this Regex for someone who wants a quick, liberal regex for URLs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.