How to parse URLs using urlparse and split() in python?

Question

Could someone explain to me the purpose of this line host = parsed.netloc.split('@')[-1].split(':')[0]in the following code? I understand that we are trying to get the host name from netlock but I don't understand why we are splitting with the @ delimiter and then again with the : delimiter.

import urlparse
parsed = urlparse.urlparse('https://www.google.co.uk/search?client=ubuntu&channel=fs')
print parsed
host = parsed.netloc.split('@')[-1].split(':')[0]
print host


Result:

ParseResult(scheme='https', netloc='www.google.co.uk', path='/search', params='', query='client=ubuntu&channel=fs, fragment='')

www.google.co.uk

Surely if one just needs the domain, we can get that from urlparse.netloc

Community · Accepted Answer · 2021-10-07 05:59:28Z

3

Netloc in its full form can have HTTP authentication credentials and a port number:

login:[email protected]:80

See RFC1808 and RFC1738

So we potentially have to split that into ["login:password", "www.google.co.uk:80"], take the last part, split that into ["www.google.co.uk", "80"] and take the hostname.

If these parts are omitted, there's no harm in trying to split on nonexisting delimeters, and no need to check if they're omitted or not.

urlparse documentation

edited Oct 7, 2021 at 5:59

CommunityBot

11 silver badge

answered Jul 4, 2013 at 21:49

Pavel Anossov

63.3k16 gold badges156 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to parse URLs using urlparse and split() in python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related