How Do I Resolve Links with Python?

Question

This function takes a string as an input, and if the string starts with http:// or the string starts with https://, the function will assume that the string is an absolute link. If the URL starts with /, the function will convert it to an absolute link.

Note that base is a global variable for now. My main concern is that this function is making too many assumptions. Is there a way to accomplish the task of resolving URLs without so many assumptions?

def get_url(item):
    #absolute link
    if item.startswith('http://') or item.startswith('https://'):
        url = item
    #root-relative link
    elif item.startswith('/'):
        url = base + item
    else:
        url = base + "/" + item
    return url

If you don't mind, Could you implement this function using the urlparse module. If not thats fine as well. — Ricky Wilson
– Ricky Wilson, Commented Nov 11, 2015 at 4:18

Paul Rooney · Accepted Answer · 2015-11-11 04:36:27Z

1

Use urljoin from the urlparse module.

from urlparse import urljoin

base = 'http://myserver.com'

def get_url(item):
    return urljoin(base, item)

urljoin handles absolute or relative links itself.

Examples

print get_url('/paul.html')
print get_url('//otherserver.com/paul.html')
print get_url('https://paul.com/paul.html')
print get_url('dir/paul.html')

Output

http://myserver.com/paul.html
http://otherserver.com/paul.html
https://paul.com/paul.html
http://myserver.com/dir/paul.html

edited Nov 11, 2015 at 4:36

answered Nov 11, 2015 at 4:25

Paul Rooney

21.7k9 gold badges47 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ricky Wilson Over a year ago

You have to keep in mind that we don't know if item is a relative link or an absolute link. or if item is even a link for that matter.

Paul Rooney Over a year ago

Does my edit clarify the first point? If it were not a link what would you expect the function to do?

Ricky Wilson Over a year ago

Why use a regular expression instead of python string method startswith? I'm probably missing the point.

Ricky Wilson Over a year ago

My apologizes , that was for the post below.

Serjik · Accepted Answer · 2015-11-11 04:33:45Z

0

1-Use regex

2-Add a trailing / to you base url

import re        
base = 'http://www.example.com/'

def get_url(item):
    #absolute link
    pattern = "(http|https)://[\w\-]+(\.[\w\-]+)+\S*"  # regex pattern to approve http and https started strings
    if re.search(pattern, item):
        url = item
    #root-relative link
    else:
        url = base + item.lstrip('/')
    return url

edited Nov 11, 2015 at 4:33

answered Nov 11, 2015 at 4:28

Serjik

11.1k8 gold badges63 silver badges72 bronze badges

2 Comments

Ricky Wilson Over a year ago

Why use a regular expression instead of python string method startswith? I'm probably missing the point.

Serjik Over a year ago

@RickyWilson : It's reduced and more clear, when you're going to match a string pattern better go with regex instead of playing with string methods

Collectives™ on Stack Overflow

How Do I Resolve Links with Python?

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related