2

I have to tranform url's entered in plain text into html hrefs and I want to find multiple urls.

This: Hi here is a link for you: http://www.google.com. Hope it works.

Will become: Hi here is a link for you: <a href='http://www.google.com'>http://www.google.com</a>. Hope it works.

Found this code:

public String transformURLIntoLinks(String text){
String urlValidationRegex = "(https?|ftp)://(www\\d?|[a-zA-Z0-9]+)?.[a-zA-Z0-9-]+(\\:|.)([a-zA-Z0-9.]+|(\\d+)?)([/?:].*)?";
Pattern p = Pattern.compile(urlValidationRegex);
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
while(m.find()){
    String found =m.group(0); 
    m.appendReplacement(sb, "<a href='"+found+"'>"+found+"</a>"); 
}
m.appendTail(sb);
return sb.toString();
}

Posted here https://stackoverflow.com/a/17704902

And it works perfectly. For all urls properly prefixed with http. But I also want to find url's starting with just www.

Can anyone that knows his regex help me out?

1
  • 2
    Try to begin your regex with (?:(https?|ftp):\/\/)? and it will match with url starting with just www Commented Mar 22, 2018 at 10:22

3 Answers 3

1

Make the (https?|ftp):// part optional. This is done by adding a question mark ?. So it will be ((https?|ftp)://)?

Use this RegEx:

\b((https?|ftp):\/\/)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[A-Za-z]{2,6}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)*(?:\/|\b)

Escape Java escape character (\):

\\b((https?|ftp):\\/\\/)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[A-Za-z]{2,6}\\b(\\/[-a-zA-Z0-9@:%_\\+.~#?&//=]*)*(?:\\/|\\b)

Examples

Example 1 (with protocol, in sentence)

Example 1

Example 2 (without protocol, in sentence)

Example 2

Sign up to request clarification or add additional context in comments.

3 Comments

The text I am transforming is not only urls. Tried implementing your code but it turned everything in my text into hrefs This: Hi here's a link for you www.google.com Take care! Turned into: <a href='Hi here's'>Hi here's</a><a href=' a link'> a link</a><a href=' for you'> for you</a><a href='www.google.com'>www.google.com</a><a href='Take care!'>Take care!</a>
@Anders I've edited my answer. I did not know it should take a link from a sentence. It should now be working.
Work like a charm. Added some code that adds the protocol and it's all set. Thanks for your help!
0

Make the www optinnal by surrounding. You case try this:

  final String urlValidationRegex = "(https?|ftp)://(www\\d?)?(|[a-zA-Z0-9]+)?.[a-zA-Z0-9-]+(\\:|.)([a-zA-Z0-9.]+|(\\d+)?)([/?:].*)?"

1 Comment

That is not what OP is asking for. He want's to be able to omit the protocol, not the subdomain.
0

You could try the following pattern.

((https?|ftp)://)?(www\d?|[a-zA-Z0-9]+)?.[a-zA-Z0-9-]+(:|.)([a-zA-Z0-9.]+|(\d+)?)([/?:].*)?

The updated code will be

public String transformURLIntoLinks(String text){
String urlValidationRegex = "((https?|ftp)://)?(www\\d?|[a-zA-Z0-9]+)?.[a-zA-Z0-9-]+(\\:|.)([a-zA-Z0-9.]+|(\\d+)?)([/?:].*)?";
Pattern p = Pattern.compile(urlValidationRegex);
Matcher m = p.matcher(text);
StringBuffer sb = new StringBuffer();
while(m.find()){
    String found =m.group(0); 
    m.appendReplacement(sb, "<a href='"+found+"'>"+found+"</a>"); 
}
m.appendTail(sb);
return sb.toString();
}

2 Comments

Why use (something|) when you can just do (something)??
Yes, Using a meta-character is always better. Updated the post, Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.