0

I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name. For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.

Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.

Any ideas (preferably in PHP, but JavaScript is also welcome)?

EDIT 1: Thanks to great feedback I think I've been able to work out a function that does what I want:

 function getdomain($url) {
    $parts = parse_url($url);
    if($parts['scheme'] != 'http') {
       $url = 'http://'.$url;
    }
    $parts2 = parse_url($url);

    $host = $parts2['host'];
    $remove = explode('.', $host);

    $result = $remove[0];
    if($result == 'www') {
       $result = $remove[1];
    }

    return $result;
 } 

It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P

3
  • Without answering your question directly (because it looks like an X-Y Problem), why don't you use the parse_url function? Commented Feb 13, 2013 at 18:17
  • Okay thanks. I tried looking for answers, but only found people wanting to split let's say a string that contains something preset. I didn't know there was a function in php that did what I wanted. Thanks very much! :) Commented Feb 13, 2013 at 18:19
  • It'll have to be a routine that has access to a list of current tld's, or public suffix list, to properly analyse where the actual domain name part, you are interested in, begins. Commented Feb 13, 2013 at 18:21

6 Answers 6

1

Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:

$url    = 'http://facebook.com/blahblah';
$parts  = parse_url($url);
$host   = $parts['host']; // facebook.com
$foo    = explode('.', $host);
$result = $foo[0]; // facebook
Sign up to request clarification or add additional context in comments.

9 Comments

What about mydomain.co.uk?
@fireeyedboy It will return "mydomain" correctly because we get the first item of the resulting (in that case 3-elements) array
Eh, yeah, you are right, but what about www.mydomain.co.uk, or mysubdomain.mydomain.co.uk or www.mysubdomain.mydomain.co.uk?
and what about subdomains?
@fireeyedboy You are correct. You could specifically filter out "www." from the hostname, but that wouldn't work for non-standard subdomains. The only solution I can think of in that case is having a full list of possible TLDs and that way filtering out the ending of the hostname.
|
0

You can use the parse_url function from PHP which returns exactly what you want - see

Comments

0

Use the parse_url method in php to get domain.com and then use replace .com with empty string. I am a little rusty on my regular expressions but this should work.

$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);

http://php.net/manual/en/function.parse-url.php

PHP REGEX: Get domain from URL

http://rubular.com/r/MvyPO9ijnQ //Check regular expressions

Comments

0

You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.

preg_match preg_replace

I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.

if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);

if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);

if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);

if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);

^ = at the start of the string i = case insensitive \ = escape char $ = the end of the string

This will have to be played around with and tweaked, but it should get your pointed in the right direction.

Comments

0

Javascript:

document.domain.replace(".com","")

PHP:

$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google

Comments

-2

This is quite a quick method but should do what you want in PHP:

function getDomain( $URL ) {
    return explode('.',$URL)[1];
}

I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

2 Comments

It will not work for stackoverflow.com (without www) for example.
You can't subscript a function that returns an array in versions of PHP before 5.4.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.