The Golang filepath module (https://golang.org/pkg/path/filepath/) contains a few functions for manipulating paths and os.Stat can be used to check if a file exists. Is there a way to check if a string actually forms a valid path at all (regardless of whether there's a file at that path or not)?
2 Answers
This problem sounds very simple, but it is actually not. Here is two possible solutions that I found :
Solution 1 - Academic
The idea here is to check a given filepath based on rules.
Problems
- Operating system (UNIX / Windows)
- Filesystem
- Reserved keywords
Operating system
The first one is the easiest. Go provides various tools for OS-specific filenames/separators/...
Example in the os package:
const (
PathSeparator = '/' // OS-specific path separator
PathListSeparator = ':' // OS-specific path list separator
)
Another one in the filepath package:
// VolumeName returns leading volume name.
// Given "C:\foo\bar" it returns "C:" on Windows.
// Given "\\host\share\foo" it returns "\\host\share".
// On other platforms it returns "".
func VolumeName(path string) string {
return path[:volumeNameLen(path)]
}
Filesystem
Filesystems have different restrictions. The maximum length or the charset allowed may vary. Unfortunately, there is no way you can tell (not as far as I know at least) which filesystem(s) your path will traverse.
Reserved keywords
Have a blacklist of all reserved keywords for a given OS.
Implementation
For this solution, I would build a lexer/parser.
The tradeoff is that it would not guarantee 100% that a filepath is valid.
Solution 2 - Empirical
Attempt to create the file and delete it right after.
func IsValid(fp string) bool {
// Check if file already exists
if _, err := os.Stat(fp); err == nil {
return true
}
// Attempt to create it
var d []byte
if err := ioutil.WriteFile(fp, d, 0644); err == nil {
os.Remove(fp) // And delete it
return true
}
return false
}
The main benefit of this solution is that it is straightforward and more accurate. If a file already exists or can be created at a given path, it means it is valid. However, this solution can also invalidate valid paths because of restricted access.
Summary
The first solution will be less accurate than the second one, even though more correct from a puristic point of view. The solution you should pick up depends on your need. Do you prefer false positives or false negatives? The first solution can give you false positives, while the second one false negatives.
Comments
For those reading this in the mid-2020s, there is now a system library function exactly for this purpose: fs.ValidPath() (the library to import is io/fs).
Here is the official example, from Go's developer documentation. Note that this function will work on Windows as well as any Unix-inspired system.
// ValidPath reports whether the given path name
// is valid for use in a call to Open.
//
// Path names passed to open are UTF-8-encoded,
// unrooted, slash-separated sequences of path elements, like “x/y/z”.
// Path names must not contain an element that is “.” or “..” or the empty string,
// except for the special case that the name "." may be used for the root directory.
// Paths must not start or end with a slash: “/x” and “x/” are invalid.
//
// Note that paths are slash-separated on all systems, even Windows.
// Paths containing other characters such as backslash and colon
// are accepted as valid, but those characters must never be
// interpreted by an [FS] implementation as path element separators.
import (
"fmt"
"io/fs"
)
func main() {
paths := []string{
".",
"x",
"x/y/z",
"",
"..",
"/x",
"x/",
"x//y",
"x/./y",
"x/../y",
}
for _, path := range paths {
fmt.Printf("ValidPath(%q) = %t\n", path, fs.ValidPath(path))
}
}
auxoraux.extwhereextis any extension you like sinceauxwas a reserved keyword for accessing the aux port in msdos or something like that:character. On Windows all these are disallowed:?*:|\". Ext3 filesystem for Unix only disallows/and the null char, but linux also handles NTFS and FAT file systems too (which is transparent to your app), so I would only allow path names that are allowed on all systems... I would disallow the NULL char and all of these:?%*:|\".