issues with my regex to detect urls in a string?

Question

Greetings all. I am using the following regex to detect urls in a string and wrap them inside the < a > tag

public static String detectUrls(String text) {

        String newText = text
                .replaceAll("(?:https?|ftps?|http?)://[\\w/%.-?&=]+",
                        "<a href='$0'>$0</a>").replaceAll(
                        "(www\\.)[\\w/%.-?&=]+", "<a href='http://$0'>$0</a>");
        return newText;
    }

i have a problem that the following links are not detected correctly: i am not that good with regex, so please advise.

http://code.google.com/p/shindig-dnd/

http://confluence.atlassian.com/display/GADGETDEV/Gadgets+and+JIRA+Portlets

www.liferay.com/web/raymond.auge/blog/

(www.opensocial.org/)

http://www.google.com

Checkout stackoverflow.com/questions/161738/…

ismail
– ismail

2010-12-23 13:31:06 +00:00
Commented Dec 23, 2010 at 13:31 — ismail
– ismail, Commented Dec 23, 2010 at 13:31

Bozho · Accepted Answer · 2010-12-23 13:31:38Z

3

I'm using this:

private static final String URL_REGEX = 
   "http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";

Matcher matcher = URL_PATTERN.matcher(text);
text = matcher.replaceAll("<a href=\"$0\">$0</a>");
return text;

answered Dec 23, 2010 at 13:31

Bozho

599k147 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Toto Over a year ago

Declare & instead of & would suffice because a, m and p are already in the range a-z and ; is delared twice.

Mahmoud Saleh Over a year ago

this pattern works fine for most cases, but didn't catch this case: (www.opensocial.org)

moinudin · Accepted Answer · 2010-12-23 13:36:59Z

2

The problem you have is that you are using - within a character group ([]) without escaping it, which is being used to define the range .-? (i.e. the characters ./0123456789:;<=>?). Either escape it \\- or put it at the end of the character class so that it doesn't complete a range.

public static String detectUrls(String text) {
    String newText = text
            .replaceAll("(?:https?|ftps?|http?)://[\\w/%.\\-?&=]+",
                    "<a href='$0'>$0</a>").replaceAll(
                    "(www\\.)[\\w/%.\\-?&=]+", "<a href='http://$0'>$0</a>");
    return newText;
}

answered Dec 23, 2010 at 13:36

moinudin

139k45 gold badges195 silver badges219 bronze badges

10 Comments

Mahmoud Saleh Over a year ago

@marcog: there's actually one pattern that's still not catched: something like http: //www.google.com

moinudin Over a year ago

@sword Is that space after http: a typo?

Mahmoud Saleh Over a year ago

@marcog,yes i meant to add it coz without the space the editor will convert it to google.com so i add this to skip the editor formatting, and you know what i want to say right ?

Mahmoud Saleh Over a year ago

@marcog, what do you suggest ?

moinudin Over a year ago

@sword Swap the replaceAll() calls around and use negative lookbehind. Here's it working: http://ideone.com/Dj6ew with one minor issue - it also adds http:// in front of the displayed URL. This is a limitation of regular expressions, and to fix it you'll have to parse the text in one pass without regular expressions.

|

Toto · Accepted Answer · 2010-12-23 14:03:24Z

1

As marcog said, you should escape the - and to match the last 2 examples you gave, you have to make the http optionnal. Also http? matches htt wich is not a correct protocol.

So the regex will be:

"(?:(?:https?|ftps?)://)?[\\w/%.?&=-]+"

answered Dec 23, 2010 at 14:03

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Collectives™ on Stack Overflow

issues with my regex to detect urls in a string?

3 Answers 3

2 Comments

10 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related