3

In a c# application i need to replace all unwanted characters with "Ã". Following is the allowed character array.

string[] wantedCharacters = new string[] { " ", "!", "\"", "#", "$", "%", "&", "\'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~" };

All the characters other than this should be replaced using "Ã". I have done it with Loopin all the string characters. But it's taking too much time to execute. I looking for a regular expression to do this. Any help will be appreciated.

1
  • 1
    Why do you think Regex would be faster than actually looping through each character? Commented Feb 22, 2013 at 6:38

3 Answers 3

4

[^c] means: everything that is not c. You should replace c with your allowed character and use that regex to replace method:

var reg = new Regex(@"[^ !""#$%&'()*+,-./0-9:;<=>?@A-Z\[\\\]^_`a-z{|}~]");
var result = reg.Replace(inputString, "Ã");
Sign up to request clarification or add additional context in comments.

Comments

4

I would not use RegEx, it will be less readable.

string input "..";
HashSet<char> wantedCharactersSet = new HashSet<char>(wantedCharacters);
for (int i = 0; i < input.Length; i++)
{
    if (!wantedCharactersSet.Contains(input[i]))
        input[i] = placeholderChar;
}

Notice that HashSet<T>.Contains() has performance O(1) while Array just n.

4 Comments

I think regex is an acceptable answer for this use case, although I don't have a way to check how fast the regex approach performs against the HashSet approach. (HashSet is not needed anyway for the OP's case specifically).
@nhahtdh: Hashset performance will be always higher than of an array
@nhahtdh: If performance does matter, I doubt RegEx if faster. If it does not matter, loop is more readable still, imo.
I think for this specific case, Regex is quite readable if written correctly. Listing all characters out is far more confusing and error prone.
4

It seems that you are trying to restrict the characters to the printable characters in ASCII (characters with code 0x20 to 0x7E). So you can use this regex:

[^\x20-\x7E]

The regex will match all unwanted characters.

Putting the regex above in literal string:

@"[^\x20-\x7E]"

Use this regex with Replace function and replace with empty string to remove all unwanted characters, or replace with some placeholder character of your choice.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.