7

I need a regex string to match URL starting with "http://", "https://", "www.", "google.com"

the code i tried using is:

//Pattern to check if this is a valid URL address
    Pattern p = Pattern.compile("(http://|https://)(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?");
    Matcher m;
    m=p.matcher(urlAddress);

but this code only can match url such as "http://www.google.com"

I know this ma be a dupicate question but i have tried all of the regex provided and it does not suit my requirement. Willl someone please help me? Thank you.

2
  • Is your requirement that the url must start with either one of these "http://" "https://" "www." "google.com" ? Commented Jul 24, 2014 at 2:19
  • No. not only google website. the google.com is jus an exampl for websites without "www." or any "http or https" in their url. Commented Jul 24, 2014 at 2:37

6 Answers 6

19

You need to make (http://|https://) part in your regex as optional one.

^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$

DEMO

Sign up to request clarification or add additional context in comments.

3 Comments

even more simpler ^(https?:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$
correct is ^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}\.([a-z]+)?$
This regex does not accept slash eg. https://www.google.com/123. It also does not accept multiple key value pairs, Eg: https://www.google.com?key1=value1&&key2=value2.
11

You can use the Apache commons library(org.apache.commons.validator.UrlValidator) for validating a url:

String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);

And use :-

 urlValidator.isValid(your url)

Then there is no need of regex.

Link:- https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html

2 Comments

You might need a regex to avoid an exception if someone tries to enter "http:\\" or "http:/"
this validator doesn't allow underscore in host names
9

If you use Java, I recommend use this RegEx (I wrote it by myself):

^(https?:\/\/)?(www\.)?([\w]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String

to explain:

  • ^ = line start
  • (https?://)? = "http://" or "https://" may occur.
  • (www.)? = "www." may orrur.
  • ([\w]+.)+ = a word ([a-zA-Z0-9]) has to occur one or more times. (extend here if you need special characters like ü, ä, ö or others in your URL - remember to use IDN.toASCII(url) if you use special characters. If you need to know which characters are legal in general: https://kb.ucla.edu/articles/what-characters-can-go-into-a-valid-http-url
  • [‌​\w]{2,63} = a word ([a-zA-Z0-9]) with 2 to 63 characters has to occur exactly one time. (a TLD (top level domain (for example .com) can not be shorter than 2 or longer than 63 characters)
  • /? = a "/"-character may occur. (some people or servers put a / at the end... whatever)
  • $ = line end

-

If you extend it by special characters it could look like this:

^(https?:\/\/)?(www\.)?([\w\Q$-_+!*'(),%\E]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w\\Q$-_+!*'(),%\\E]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String

The answer of Avinash Raj is not fully correct.

^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$

The dots are not escaped what means it matches with any character. Also my version is simpler and I never heard of a domain like "test..com" (which actually matches...)

Demo: https://regex101.com/r/vM7wT6/279


Edit: As I saw some people needing a regex which also matches servers directories I wrote this:

^(https?:\/\/)?([\w\Q$-_+!*'(),%\E]+\.)+(\w{2,63})(:\d{1,4})?([\w\Q/$-_+!*'(),%\E]+\.?[\w])*\/?$

while this may not be the best one, since I didn't spend too much time with it, maybe it helps someone. You can see how it works here: https://regex101.com/r/vM7wT6/700 It also matches urls like "hello.to/test/whatever.cgi"

Comments

3

Java compatible version of @Avinash's answer would be

//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("^(http://|https://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$");
Matcher m;
m=p.matcher(urlAddress);
boolean matches = m.matches();

Comments

1
pattern="w{3}\.[a-z]+\.?[a-z]{2,3}(|\.[a-z]{2,3})"

this will only accept addresses like e.g www.google.com & www.google.co.in

Comments

-1

//I use that

static boolean esURL(String cadena){

    boolean bandera = false;

    bandera = cadena.matches("\\b(https://?|ftp://|file://|www.)[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]");

    return bandera;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.