0

I want to add scheme in urls if not present.

import urlparse

p = urlparse.urlparse(url)
print p
netloc = p.netloc or p.path
path = p.path if p.netloc else ''
scheme = p.scheme or 'http'
p = urlparse.ParseResult(scheme, netloc, path, *p[3:])
url = p.geturl()
print url

The above code works great, in case when I dont have any port number. When port number is there, it show arbitary output. For eg:-

input go.com:8000/3/
output go.com://8000/3/

Same goes for localhost. What approach should I been following in this case?

2 Answers 2

1

if you have port number and dont have the url scheme your url must start with //. urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

check out the following code and observe the diffrence

1) In this first sample i have added // so that the parser will identify it as the netloc rather than the scheme and then comes the path.

p.urlparse('//go.com:8000/3/')
ParseResult(scheme='', netloc='go.com:8000', path='/3/', params='', query='', fragment='')

2) In this sample we dont have the scheme and dint specify the // and we dont have the port number so the entire url is considered as the path.

p.urlparse('go.com/3/')
ParseResult(scheme='', netloc='', path='go.com/3/', params='', query='', fragment='')

3)In this sample i did specify the port. we know that after the scheme we have ://, parser recognized before : as the scheme and after : as path.

p.urlparse('go.com:8000/3/')
ParseResult(scheme='go.com', netloc='', path='8000/3/', params='', query='', fragment='')

this is how the urlparse is parsing the url. for you to get the url scheme to work, check for :// if you dint find explicitly append // in the front of your url then the job will be done.

for more detail you can visit this url [https://docs.python.org/2/library/urlparse.html]

Sign up to request clarification or add additional context in comments.

Comments

0

According to the docs you need to properly introduce netloc to be parsed correctly. So try adding // at the beginning of the url if it's not an absolute path so like:

urlparse.urlparse('//go.com:8000/3')
ParseResult(scheme='', netloc='go.com:8000', path='/3', params='', query='', fragment='')

This way it correctly identifies each part of the url. Also please see the docs: https://docs.python.org/2/library/urlparse.html#urlparse.urlparse

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.