34

I have a directory with lots of folders, sub-folder and all with files in them. The idea of my project is to recurse through the entire directory, gather up all the names of the files and replace invalid characters (invalid for a SharePoint migration).

However, I'm completely unfamiliar with Regular Expressions. The characters i need to get rid in filenames are: ~, #, %, &, *, { } , \, /, :, <>, ?, -, | and "" I want to replace these characters with a blank space. I was hoping to use a string.replace() method to look through all these file names and do the replacement.

So far, the only code I've gotten to is the recursion. I was thinking of the recursion scanning the drive, fetching the names of these files and putting them in a List<string>.

Can anybody help me with how to find/replace invalid chars with RegEx with those specific characters?

2

2 Answers 2

56
string pattern = "[\\~#%&*{}/:<>?|\"-]";
string replacement = " ";

Regex regEx = new Regex(pattern);
string sanitized = Regex.Replace(regEx.Replace(input, replacement), @"\s+", " ");

This will replace runs of whitespace with a single space as well.

Sign up to request clarification or add additional context in comments.

7 Comments

string pattern = "[\\~#%&*{}/:<>?|"-]"; is better - less unnecessary escaping.
@Tim thanks! I will edit my solution. Most of my regex experience is in Perl where I use regex literals. So I'm not entirely sure what needs to be escaped and what doesn't in C# or Java. It's mostly trial-and-error.
I just noticed that yeahumok wanted to replace the invalid characters with a space, not the empty string. I have removed the + from my version again, expecting that he wants one space for each invalid character, even if there are several in a row.
is there a way to get rid of extra spaces? for example: Deal A & B.txt becomes Deal A B.txt (3 spaces in b/w letters) Is there any way to trim off that extra space so it looks like: Deal A B.txt (1 space b/w letters) ?
string pattern = @"[\\~#%&*{}/:<>?|""-]"; works better, since it also removes backslashes. The current code string pattern = "[\\~#%&*{}/:<>?|\"-]"; is a regex pattern that starts with an escaped tilde - not an escaped backslash.
|
12

is there a way to get rid of extra spaces?

Try something like this:

string pattern = " *[\\~#%&*{}/:<>?|\"-]+ *";
string replacement = " ";

Regex regEx = new Regex(pattern);
string sanitized = regEx.Replace(input, replacement);

Consider learning a bit about regular expressions yourself, as it's also very useful in developing (e.g. search/replace in Visual Studio).

1 Comment

also, is there any way to remove any extraneous '.' (periods) in a filename? for example: 0.0.0.1.doc How would i handle this w/o it wiping out the .doc?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.