2

I've been trying myself, and searching online, to write this regular expression but without success.

I need to validate that a given URL is from a specific domain and a well-formed link (in PHP). For example:

Good Domain: example.com

So good URLs from example.com:

So bad URLs not from example.com:

Some notes: I don't care about "http" verus "https" but if it matters to you assume "http" always The code that will use this regex is PHP so extra points for that.

UPDATE 2010:

Gruber adds a great URL regex:

?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

See his post: An Improved Liberal, Accurate Regex Pattern for Matching URLs

4
  • Your "Good Domain" example is not a valid URL (missing path). Commented Jul 2, 2009 at 14:04
  • @Nikolar Ruhe: The path actually is optional: "http://" hostport [ "/" hpath [ "?" search ]] (see RFC 1738) Commented Jul 2, 2009 at 14:07
  • That's not indicating a valid URL, rather it is indicating the valid domain used by the example URLs but maybe I should just say 'blah.com' and no more. Either way, I think the point is made. Commented Jul 2, 2009 at 14:08
  • is example.com:25 good or bad? And [email protected] ? Commented Jul 2, 2009 at 14:54

5 Answers 5

7

Do you have to use a regex? PHP has a lot of built in functions for doing this kind of thing.

filter_var($url, FILTER_VALIDATE_URL)

will tell you if a URL is valid, and

    $domain = parse_url($url, PHP_URL_HOST);

will tell you the domain it refers to.

It might be clearer and more maintainable than some mad regex.

Sign up to request clarification or add additional context in comments.

Comments

5

My stab at it

<?php

$pattern = "#^https?://([a-z0-9-]+\.)*blah\.com(/.*)?$#";

$tests = array(
    'http://blah.com/so/this/is/good'
  , 'http://blah.com/so/this/is/good/index.html'
  , 'http://www.blah.com/so/this/is/good/mice.html#anchortag'
  , 'http://anysubdomain.blah.com/so/this/is/good/wow.php'
  , 'http://anysubdomain.blah.com/so/this/is/good/wow.php?search=doozy'
  , 'http://any.sub-domain.blah.com/so/this/is/good/wow.php?search=doozy' // I added this case
  , 'http://999.sub-domain.blah.com/so/this/is/good/wow.php?search=doozy' // I added this case
  , 'http://obviousexample.com'
  , 'http://bbc.co.uk/blah.com/whatever/you/get/the/idea'
  , 'http://blah.com.example'
  , 'not/even/a/blah.com/url'
);

foreach ( $tests as $test )
{
  if ( preg_match( $pattern, $test ) )
  {
    echo $test, " <strong>matched!</strong><br>";
  } else {
    echo $test, " <strong>did not match.</strong><br>";
  }
}

//  Here's another way
echo '<hr>';
foreach ( $tests as $test )
{
  if ( $filtered = filter_var( $test, FILTER_VALIDATE_URL ) )
  {
    $host = parse_url( $filtered, PHP_URL_HOST );
    if ( $host && preg_match( "/blah\.com$/", $host ) )
    {
      echo $filtered, " <strong>matched!</strong><br>";
    } else {
      echo $filtered, " <strong>did not match.</strong><br>";
    }
  } else {
    echo $test, " <strong>did not match.</strong><br>";
  }
}

4 Comments

The docs for the parse_url function state that it isn't meant to validate URLs: invalid URLs may still get parsed. So you need some additional checks.
Oh, I agree - it probably needs more rigorous testing. Still, my regex solution works just as well.
I adopted the logic of your post into my 2nd algo. Seems to work well!
Brilliant Peter :) - exactly what I was looking for.
1

Perhaps:

^https?://[^/]*blah\.com(|/.*)$

Edit:

Protect against http://editblah.com

^https?://(([^/]*\.)|)blah\.com(|/.*)$

1 Comment

Close! But this would false positive a domain like fooblah.com
0
\b(https?)://([-A-Z0-9]+\.)*blah.com(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[A-Z0-9+&@#/%=~_|!:,.;]*)?

1 Comment

I think that would allow blah.com.evil.domain (assuming the A-Z is A-Za-z)
0
!^https?://(?:[a-zA-Z0-9-]+\.)*blah\.com(?:/[^#]*(?:#[^#]+)?)?$!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.