0

This function takes a string as an input, and if the string starts with http:// or the string starts with https://, the function will assume that the string is an absolute link. If the URL starts with /, the function will convert it to an absolute link.

Note that base is a global variable for now. My main concern is that this function is making too many assumptions. Is there a way to accomplish the task of resolving URLs without so many assumptions?

def get_url(item):
    #absolute link
    if item.startswith('http://') or item.startswith('https://'):
        url = item
    #root-relative link
    elif item.startswith('/'):
        url = base + item
    else:
        url = base + "/" + item
    return url
2
  • Try using the urlparse module. Commented Nov 11, 2015 at 4:13
  • If you don't mind, Could you implement this function using the urlparse module. If not thats fine as well. Commented Nov 11, 2015 at 4:18

2 Answers 2

1

Use urljoin from the urlparse module.

from urlparse import urljoin

base = 'http://myserver.com'

def get_url(item):
    return urljoin(base, item)

urljoin handles absolute or relative links itself.

Examples

print get_url('/paul.html')
print get_url('//otherserver.com/paul.html')
print get_url('https://paul.com/paul.html')
print get_url('dir/paul.html')

Output

http://myserver.com/paul.html
http://otherserver.com/paul.html
https://paul.com/paul.html
http://myserver.com/dir/paul.html
Sign up to request clarification or add additional context in comments.

4 Comments

You have to keep in mind that we don't know if item is a relative link or an absolute link. or if item is even a link for that matter.
Does my edit clarify the first point? If it were not a link what would you expect the function to do?
Why use a regular expression instead of python string method startswith? I'm probably missing the point.
My apologizes , that was for the post below.
0

1-Use regex

2-Add a trailing / to you base url

import re        
base = 'http://www.example.com/'

def get_url(item):
    #absolute link
    pattern = "(http|https)://[\w\-]+(\.[\w\-]+)+\S*"  # regex pattern to approve http and https started strings
    if re.search(pattern, item):
        url = item
    #root-relative link
    else:
        url = base + item.lstrip('/')
    return url

2 Comments

Why use a regular expression instead of python string method startswith? I'm probably missing the point.
@RickyWilson : It's reduced and more clear, when you're going to match a string pattern better go with regex instead of playing with string methods

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.