7

I need to parse the domain name from a string. The string can vary and I need the exact domain.

Examples of Strings:

http://somename.de/
www.somename.de/
somename.de/
somename.de/somesubdirectory
www.somename.de/?pe=12

I need it in the following format with just the domain name, the tld, and the www, if applicable:

www.somename.de

How do I do that using C#?

0

4 Answers 4

12

As an alternative to a regex solution, you can let the System.Uri class parse the string for you. You just have to make sure the string contains a scheme.

string uriString = "http://www.google.com/search";

if (!uriString.Contains(Uri.SchemeDelimiter))
{
    uriString = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriString);
}

string domain = new Uri(uriString).Host;

This solution also filters out any port numbers and converts IPv6 addresses to its canonical form.

Sign up to request clarification or add additional context in comments.

3 Comments

Your answers looks valid also.
@AbdulSaboor, what would you expect? The URL contains a host name with a space in it (" blabla") which makes it an invalid host name. Just the "http://" is also an invalid URL. The Uri constructor expects a valid URL.
1. It says the valid url if I remove the space. 2. i tried with only blabla still it says it is valid url. I think it should not.
11

i simple used

 Uri uri = new Uri("http://www.google.com/search?q=439489");
            string url = uri.Host.ToString();
            return url;

because by using this you can sure.

1 Comment

Can't be so sure though, your solution also accepts "h t t p : / / h t t p : / /yee" as a correct url (without the spaces, but stackoverflow changes the double http:// into one...
2

I checked out Regular Expression Library, and it looks like something like this might work for you:

^(([\w][\w\-\.]*)\.)?([\w][\w\-]+)(\.([\w][\w\.]*))?$

2 Comments

@Umair Ashraf - you should probably explain how it doesn't work. Can you give an example of a line it doesn't match?
I straight put this line in Regex connstructor like (@"^(([\w][\w\-\.]*)\.)?([\w][\w\-]+)(\.([\w][\w\.]*))?$")
1

Try this:

^(?:\w+://)?([^/?]*)

this is a weak regex - it doesn't validate the string, but assumes it's already a url, and gets the first word, until the first slash, while ignoring the protocol. To get the domain look at the first captured group, for example:

string url = "http://www.google.com/hello";
Match match = Regex.Match(url, @"^(?:\w+://)?([^/?]*)");
string domain = match.Groups[1].Value;

As a bonus, it also captures until the first ?, so the url google.com?hello=world will work as expected.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.