2

Could someone explain to me the purpose of this line host = parsed.netloc.split('@')[-1].split(':')[0]in the following code? I understand that we are trying to get the host name from netlock but I don't understand why we are splitting with the @ delimiter and then again with the : delimiter.

import urlparse
parsed = urlparse.urlparse('https://www.google.co.uk/search?client=ubuntu&channel=fs')
print parsed
host = parsed.netloc.split('@')[-1].split(':')[0]
print host


Result:

ParseResult(scheme='https', netloc='www.google.co.uk', path='/search', params='', query='client=ubuntu&channel=fs, fragment='')

www.google.co.uk

Surely if one just needs the domain, we can get that from urlparse.netloc

1 Answer 1

3

Netloc in its full form can have HTTP authentication credentials and a port number:

login:[email protected]:80

See RFC1808 and RFC1738

So we potentially have to split that into ["login:password", "www.google.co.uk:80"], take the last part, split that into ["www.google.co.uk", "80"] and take the hostname.

If these parts are omitted, there's no harm in trying to split on nonexisting delimeters, and no need to check if they're omitted or not.

urlparse documentation

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.