19

I'm writing a personal wiki-style program in Python that stores text files in a user configurable directory.

The program should be able to take a string (e.g. foo) from a user and create a filename of foo.txt. The user will only be able to create the file inside the wiki directory, and slashes will create a subdir (e.g. foo/bar becomes (path-to-wiki)/foo/bar.txt).

What is the best way to check that the input is as safe as possible? What do I need to watch out for? I know some common pitfalls are:

  • Directory traversal: ../
  • Null bytes: \0

I realize that taking user input for filenames is never 100% safe, but the program will only be run locally and I just want to guard for any common errors/glitches.

4
  • 1
    What target operating system? What versions of python? Commented Dec 31, 2011 at 7:07
  • @IgnacioVazquez-Abrams: yes, but plain text files in the file-system have other benefits. Commented Dec 31, 2011 at 7:52
  • @g.d.d.c: Python 2.7 and/or 3.2, and MacOS/Linux primarily. Commented Dec 31, 2011 at 7:54
  • stackoverflow.com/questions/4814040/… Commented Mar 3, 2017 at 21:53

4 Answers 4

13

You can enforce the user to create a file/directory inside wiki by normalizing the path with os.path.normpath and then checking if the path begins with say '(path-to-wiki)'

os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')

To ensure that the user's entered path/filename doesn't contain anything nasty, you can force the user to enter a path or filename to either of Lower/Upper Alpha, Numeric Digits or may be hyphen or underscore.

Then you can always check the normalized filename using a similar regular expression

userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt')
re.findall(r'[^A-Za-z0-9_\-\\]',userpath)

To summarize

if userpath=os.path.normpath('(path-to-wiki)/foo/bar.txt') then

if not os.path.normpath('(path-to-wiki)/foo/bar.txt').startswith('(path-to-wiki)')  
   or re.search(r'[^A-Za-z0-9_\-\\]',userpath):
  ... Do what ever you want with an invalid path
Sign up to request clarification or add additional context in comments.

2 Comments

Your regex is is a little extreme! There are so many more characters that are allowed in paths!
Also, to don't allow forward slashes in your paths?
10

now there is a full library to validate strings: check it out:

from pathvalidate import sanitize_filepath

fpath = "fi:l*e/p\"a?t>h|.t<xt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))

fpath = "\0_a*b:c<d>e%f/(g)h+i_0.txt"
print("{} -> {}".format(fpath, sanitize_filepath(fpath)))

output:

fi:l*e/p"a?t>h|.t<xt -> file/path.txt
_a*b:c<d>e%f/(g)h+i_0.txt -> _abcde%f/(g)h+i_0.txt

1 Comment

You probably want to use sanitize_filename to avoid keeping slashes in the string (which stand for subdirectories)
6

Armin Ronacher has a blog post on this subject (and others).

These ideas are implemented as the safe_join() function in Flask:

def safe_join(directory, filename):
    """Safely join `directory` and `filename`.
    Example usage::
        @app.route('/wiki/<path:filename>')
        def wiki_page(filename):
            filename = safe_join(app.config['WIKI_FOLDER'], filename)
            with open(filename, 'rb') as fd:
                content = fd.read() # Read and process the file content...
    :param directory: the base directory.
    :param filename: the untrusted filename relative to that directory.
    :raises: :class:`~werkzeug.exceptions.NotFound` if the resulting path
             would fall out of `directory`.
    """
    filename = posixpath.normpath(filename)
    for sep in _os_alt_seps:
        if sep in filename:
            raise NotFound()
    if os.path.isabs(filename) or filename.startswith('../'):
        raise NotFound()
    return os.path.join(directory, filename)

Comments

0

You could just validate all the characters are printable alphanumeric ascii except for the ' ','.', and '/' characters then just remove all instances of bad combinations...

safe_string = str()
for c in user_supplied_string:
    if c.isalnum() or c in [' ','.','/']:
        safe_string = safe_string + c

while safe_string.count("../"):
    # I use a loop because only replacing once would 
    # leave a hole in that a bad guy could enter ".../"
    # which would be replaced to "../" so the loop 
    # prevents tricks like this!
    safe_string = safe_string.replace("../","./")
# Get rid of leading "./" combinations...
safe_string = safe_string.lstrip("./")

That's what I would do, I don't know how pythonic it is but it should leave you pretty safe. If you wanted to validate and not convert then you could just do a test for equality after that like so:

valid = save_string == user_supplied_string
if not valid:
     raise Exception("Sorry the string %s contains invalid characters" % user_supplied_string )

In the end both approaches would probably work, I find this method feels a bit more explicit and should also screen out any weird/non-appropriate characters like '\t','\r', or '\n' Cheers!

1 Comment

You can have non-ASCII filenames.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.