0

I'm using python and trying to use a regex to see whether there is a url within my string. I've tried multiple different regexes but they always come out with 'None', even if the string is clearly a website.

Example:

>>> print re.search(r'/((?:https?\:\/\/|www\.)(?:[-a-z0-9]+\.)*[-a-z0-9]+.*)/i','www.google.com')
None

Any help would be appreciated!

1
  • 2
    remove leading / and trailing /i Commented Dec 20, 2014 at 3:54

4 Answers 4

1

What about, as in Python Regex for URL doesn't work , switching to something like:

r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'

For a detailed survey of many, many regexes validating URLs, see https://mathiasbynens.be/demo/url-regex ...

Sign up to request clarification or add additional context in comments.

Comments

0

If you want to check if a string is an URL you can use:

print re.search(r'(^(https?://|www\.)([a-z0-9-]+\.)+([a-z0-9]+)$)','www.google.com', re.I)

If you want to verify if a string contains a URL, you only need to remove the ^ and $ patterns:

print re.search(r'((https?://|www\.)([a-z0-9-]+\.)+([a-z0-9]+))','www.google.com', re.I)

Remember: re.I is for case-insensitive matching, the '^' matches beginning of line and $ matches end of line.

Comments

0

The grammar for a valid URL has been explained here in this Wiki. Based on that this regex can match a string if it has valid URL.

^((?:https?|ftp):\/{2}[\w.\/]+(?::\d{1,4})?\/?[?\w_#\/.]+)

And in case if you want to keep the scheme part of the URL optional.

^((?:https?|ftp)?(?::\/{2})?[\w.\/]+(?::\d{1,4})?\/?[?\w_#\/.]+)

Output

>>> re.search(r'^((?:https?|ftp)?(?::\/{2})?[\w.\/]+(?::\d{1,4})?\/?[?\w_#\/.]+)','www.google.com').group()
'www.google.com'
>>> re.search(r'^((?:https?|ftp)?(?::\/{2})?[\w.\/]+(?::\d{1,4})?\/?[?\w_#\/.]+)','http://www.google.com').group()
'http://www.google.com'
>>> re.search(r'^((?:https?|ftp)?(?::\/{2})?[\w.\/]+(?::\d{1,4})?\/?[?\w_#\/.]+)','https://www.google.com').group()
'https://www.google.com'

You can see a detailed demo and explanation about how it work here.

Comments

0

i've used the following regex in order to verify that the inserted string is a URL:

r'((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.