django - URL validation in Python - edge cases

I'm trying to write/use URL validation in python by simply analyzing the string (no http requests) but I get a lot of edge cases with the different solutions I tried.

After looking at django's urlvalidator, I still have some edge cases that are misclassified:

def is_url_valid(url: str) -> bool:
    # from django urlvalidator
    url_pattern = re.compile(
        r'^(?:http|ftp)s?://'  # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'  # domain...
        r'localhost|'  # localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'  # ...or ip
        r'(?::\d+)?'  # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

 

    return bool(re.match(url_pattern, url))

we still get:

>>> is_url_valid('contact.html')
True

Other approaches we tried:

validators (python package), recommended by this SO Q&A

>>> import validators
>>> validators.url("firespecialties.com/bullseyetwo.html") # this is a valid url
ValidationFailure(func=url, args={'value': 'firespecialties.com/bullseyetwo.html', 'public': False})

from this validating urls in python SO Q&A while urllib.parse.urlparse('contact.html') correctly assess it as a path, it fails with urllib.parse.urlparse('www.images.com/example.html')`:

>>> from urllib.parse import urlparse
>>> urlparse('www.images.com/example.html')
ParseResult(scheme='', netloc='', path='www.images.com/example.html', params='', query='', fragment='')

Adapting logic from this javascript SO Q&A

asked Jan 3, 2023 at 20:54

Shili Ho

515 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

URL validation in Python - edge cases

Other approaches we tried:

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Other approaches we tried:

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked