19

I have a bunch of domain names coming in like this:

http://subdomain.example.com (example.com is always example.com, but the subdomain varies).

I need "subdomain".

Could some kind person who had the patience to learn regex help me out?

1
  • Yes, you can have string.string.domain.gtld Commented Dec 1, 2014 at 4:28

7 Answers 7

52

The problem with the above regex is: if you do not know what the protocol is, or what the domain suffix is, you will get some unexpected results. Here is a little regex accounts for those situations. :D

/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i  //javascript

This should always return your subdomain (if present) in group 1. Here it is in a Javascript example, but it should also work for any other engine that supports positive look-ahead assertions:

// EXAMPLE of use
var regex = /(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i
  , whoKnowsWhatItCouldBe = [
                        "www.mydomain.com/whatever/my-site" //matches: www
                      , "mydomain.com"// does not match
                      , "http://mydomain.com" // does not match
                      , "https://mydomain.com"// does not match
                      , "banana.com/somethingelse" // does not match
                      , "https://banana.com/somethingelse.org" // does not match
                      , "http://what-ever.mydomain.mu" //matches: what-ever
                      , "dev-www.thisdomain.com/whatever" // matches: dev-www
                      , "hot-MamaSitas.SomE_doma-in.au.xxx"//matches: hot-MamaSitas
                  , "http://hot-MamaSitas.SomE_doma-in.au.xxx" // matches: hot-MamaSitas
                  , "пуст.пустыня.ru" //even non english chars! Woohoo! matches: пуст
                  , "пустыня.ru" //does not match
                  ];

// Run a loop and test it out.
for ( var i = 0, length = whoKnowsWhatItCouldBe.length; i < length; i++ ){
    var result = whoKnowsWhatItCouldBe[i].match(regex);
    if(result != null){
      // YAY! We have a match!
    } else {
      // Boo... No subdomain was found
    }
}
Sign up to request clarification or add additional context in comments.

7 Comments

this is clearly the best answer because it accounts for protocol, none/multiple subdomains, and it is domain independent.
I would wonder the desired output of multiple subdomains... Would you want it to return one.two or just one? I suppose we could tweak the regex to pull all (.\.) groups prior to the domain... maybe later
Nice job, +1. (file:\/\/|http:\/\/|https:\/\/|\/\/)*(.*?)\.(?=[^\/]*\..{2,5}) if you want to allow other prorocols
This worked in google analytics to filter by subdomain - had to drop the leading / and the trailing /i (?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})
@WebandFlow, The result SomE_doma-in is the subdomain of your example, is it not? I am unclear what you had expected, vs. what you got. I personally expect SomE_doma-in as the match...
|
24
/(http:\/\/)?(([^.]+)\.)?domain\.com/

Then $3 (or \3) will contain "subdomain" if one was supplied.

If you want to have the subdomain in the first group, and your regex engine supports non-capturing groups (shy groups), use this as suggested by palindrom:

/(?:http:\/\/)?(?:([^.]+)\.)?domain\.com/

3 Comments

True. He didn't mention language/library so I wanted to make the regex as portable as possible - not sure if all implementations allow non-capturing groups.
What if you don't know what domain is?
@DallasClark In that case, I would recommend my answer below
6

Purely the subdomain string (result is $1):

^http://([^.]+)\.domain\.com

Making http:// optional (result is $2):

^(http://)?([^.]+)\.domain\.com

Making the http:// and the subdomain optional (result is $3):

(http://)?(([^.]+)\.)?domain\.com

Comments

2

It should just be

\Qhttp://\E(\w+)\.domain\.com

The sub domain will be the first group.

Comments

0
#!/usr/bin/perl

use strict;
use warnings;

my $s = 'http://subdomain.example.com';
my $subdomain = (split qr{/{2}|\.}, $s)[1];

print "'$subdomain'\n";

Comments

0

To math sub domains with dot character in it, I used this one

https?:\/\/?(?:([^*]+)\.)?domain\.com

to get all matching characters after protocol until domain.

https://sub.domain.com (sub)

https://sub.sub.domain.com (sub.sub) ...

Comments

-1

1st group of

http://(.*).example.com

1 Comment

Forgetting, of course, that .* will match an empty string and, more importantly, that the period stands for any character.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.