1

I need a general script/pattern to extract the main domain name from URLs. I have the following attempt that failed.

Let use say I have this link1 and need to extract the main domain name (google.co.uk) without the sub-domain (mail). I made this script which worked fine with .co.uk but will fail with websites that has one top-level domain name like: .com and .com.

Is there a better way to extract main domain name from ANY URL? The URL is constructed as follows:

https://(optional sub-domain)*(domain name with two or three top-level domain name)(optional forward slash followed by text)*

The * refer to zero or more times.

var link1="https://mail.google.co.uk/link/link/link";
var url = new URL(link1);
var domain = url.hostname.split('.').slice(-3).join('.');
console.log("The domain name is: "+ domain);

In the above code, I expect: google.co.uk

It could work because the link has two parts in the top-level domain name (.co.uk) so -3 works. But I need the code to work with this link as well:

var link1="https://mail.google.com/link/link/link";

And I need the output to be: google.com

But the problem is that the code produces:

mail.google.com

And I only want the main domain name: google.com

EDIT: Some of the expected output examples are here:

1) In mail.google.co.uk it should be: google.co.uk

2) In mail.google.com it should be: google.com

3) In link.mail.google.com/link/link it should be: google.com

4) In link.link2.mail.google.com it should be: google.com

i.e. just the main domain name without sub-domains or links after the domain name. The top-level domain name can be in the fom of (.com, .net, .org, etc.) or in the form of (.co.uk, .co.us, etc). The top-level domain name should be captured either if it is one part or two parts (my code capture only two parts).

8
  • what is the expected output of domain from link1? Commented Mar 29, 2018 at 12:39
  • @Nikola Lukic that link is to extract the top-level domain name. I am asking about the main domain name in addition to the top-level domain name. e.g. google.com, google.co.uk. Commented Mar 29, 2018 at 12:51
  • Problem for parsing i see with '.' and double dot. You must make some validation object and define concrete roles. For example make this ".co.uk" like exception case . Program must know when is two or one dot valid result. Commented Mar 29, 2018 at 13:00
  • @Nikola Lukic it is for any URL. I can not make exception. It is not only .co.uk but can be any thing. For example: .co.us or any other type. Commented Mar 29, 2018 at 13:02
  • Possible duplicate of Issue while capturing Top-Level Domain from URL Commented Mar 29, 2018 at 14:00

1 Answer 1

0

Sure if you wanted

"mail.google.co.uk"

you can just use

url.host

or if you wanted it with headers, use

url.origin

cheers!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.