Regex: validate a URL path with no query params

Question

I'm not a regex expert and I'm breaking my head trying to do one that seems very simple and works in python 2.7: validate the path of an URL (no hostname) without the query string. In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: /, ., -

I found this post that is very similar to what I need but for me isn't working at all, I can test with for example aaa and it will return true even if it doesn't start with /.

The current regex that I have kinda working is this one:

[^/+a-zA-Z0-9.-]

but it doesn't work with paths that don't start with /. For example:

/aaa -> true, this is ok
/aaa/bbb -> true, this is ok
/aaa?q=x -> false, this is ok
aaa -> true, this is NOT ok

Andrew Cheong · Accepted Answer · 2012-10-17 07:08:01Z

6

The regex you've defined is a character class. Instead, try:

^\/[/.a-zA-Z0-9-]+$

answered Oct 17, 2012 at 7:08

Andrew Cheong

30.4k17 gold badges103 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Morten Jensen Over a year ago

The difference here being, that you require the string to begin with / (^ means begin-of-line) and that you've added the + and end-of-line anchor $. Which makes the whole character-class capture optional

Jon Over a year ago

^\/[\/\.a-zA-Z0-9\-]+$, the /, ., and - should be escaped.

Andrew Cheong Over a year ago

@Jon - You are right about escaping / in many implementations, but in Python, it need not be escaped. . need not be escaped either, as long as it's within a character class, at least in any implementation I know of. And - need not be escaped if it is specified first or last in a character class. Of course, not that it hurts to escape "too much," but just wanted to clarify these details.

localhostdotdev Over a year ago

you might want to use \A and \z to match the beginning and end of a potentially multi-line string

Burhan Khalid · Accepted Answer · 2012-10-17 07:26:10Z

In other words, a string that starts with /, allows alphanumeric values and doesn't allow any other special chars except these: /, ., -

You are missing some characters that are valid in URLs

import string
import urllib
import urlparse

valid_chars = string.letters + string.digits + '/.-~'
valid_paths = []

urls = ['http://www.my.uni.edu/info/matriculation/enroling.html',
    'http://info.my.org/AboutUs/Phonebook',
    'http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98',
    'http://www.my.org/462F4F2D4241522A314159265358979323846',
        'http://www.myu.edu/org/admin/people#andy',
        'http://www.w3.org/RDB/EMP?*%20where%20name%%3Ddobbins']

for i in urls:
   path = urllib.unquote(urlparse.urlparse(i).path)
   if path[0] == '/' and len([i for i in path if i in valid_chars]) == len(path):
        valid_paths.append(path)

Gábor Lipták · Accepted Answer · 2012-10-17 07:07:45Z

0

Try this:

^(?:/[a-zA-Z0-9.-&&[^/]]*)+$

Seems to work. See the picture: enter image description here

answered Oct 17, 2012 at 7:07

Gábor Lipták

9,8343 gold badges62 silver badges117 bronze badges

Comments

Morten Jensen · Accepted Answer · 2012-10-17 07:08:20Z

0

Try posting some more code. I can't figure out how you're using your regex from your question. What's confusing me is, your re expression [^/+a-zA-Z0-9.-] basically says:

Match a single character if it is:

not a / or a-z (caps and lower both) or 0-9 or a dot or a dash

It doesn't quite make sense to me without knowing how you use it, as it only matches a single charactre and not a whole URL string.

I'm not sure I understand why you cannot start with a /.

answered Oct 17, 2012 at 7:08

Morten Jensen

5,9573 gold badges46 silver badges57 bronze badges

1 Comment

Maciej Kravchyk Over a year ago

Because it's an url path. Right after host, there must be a slash. Let's say you have var ajax_path='example.com'+unvalidated_path . If the path doesn't start with /, malicious user could insert ".", like .externaldomain.com and perform XSS attack.

Collectives™ on Stack Overflow

Regex: validate a URL path with no query params

4 Answers 4

4 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related