4

Looking for help where given any string, return a string with alphanumeric characters only and replace all non-alphanumeric characters with _

so string "ASD@#$123" becomes "ASD___123"

etc

thanks

2
  • return txt.Where(Char.IsLetterOrDigit).ToArray()) shrinks the string, still thinking of how to replace them :) Commented May 15, 2012 at 19:01
  • possible duplicate of Replace all Special Characters in a string IN C# Commented May 15, 2012 at 19:07

3 Answers 3

10

For most string operations, you would be better off (in terms of both efficiency and conciseness) if you use regular expressions rather than LINQ:

string input = "ASD@#$123";
string result = Regex.Replace(input, "[^A-Z0-9]", "_", RegexOptions.IgnoreCase);

If you want to preserve any Unicode alphanumeric character, including non-ASCII letters such as é, we can use the non-word character class to make it even simpler:

string input = "ASD@#$123";
string result = Regex.Replace(input, @"\W", "_");

For the sake of comparison, here is the same conversion done using LINQ (allowing just ASCII letters and digits):

string input = "ASD@#$123";
string result =
    new string(input.Select(c => 
        c >= 'A' && c <= 'Z' || c >= 'a' && c <= 'z' || c >= '0' && c <= '9' ? c : '_'
    ).ToArray());

Or, if Char.IsLetterOrDigit meets your requirements:

string input = "ASD@#$123";
string result = 
    new string(input.Select(c => char.IsLetterOrDigit(c) ? c : '_').ToArray());

Note that Char.IsLetterOrDigit will allow non-ASCII letters, and is comparable to the \w word character class whose negation was used in our second example.

Edit: As Steve Wortham has observed, the LINQ versions are actually more than 3× faster than the regex (even when a Regex instance is created in advance with RegexOptions.Compiled and re-used).

Sign up to request clarification or add additional context in comments.

7 Comments

Ö or α ,for ex, are alphanumeric chars :)
@L.B: Since this is a redaction operation, it’s more reasonable to assume that non-ASCII characters are not permitted (although I made a note to that effect at the end).
@L.B: For the sake of clarity, I’ve added another example which preserves Unicode letters.
+1. Although I must note that even though your first Regex solution is concise, it is over 3 times slower than your Linq solutions. It helps somewhat to enable RegexOptions.Compiled, but Linq still wins the race easily.
@SteveWortham: Strange… Let me test it out on my end.
|
0
char[] unwanted = new[] {'@', '#', '$'};

foreach(var x in query)
{
    x.SomePropertyName = string.Join("_", x.SomePropertyName.Split(unwanted));
};

LINQ lambda expression to replace multiple characters in a string

Comments

0

Here is the function for you:

    String ReplaceWrongChars(String baseString)
    {
        Regex rx = new Regex("[^A-Za-z0-9 ]", RegexOptions.CultureInvariant);
        String rv = rx.Replace(baseString, "_");

        return rv;
    }

If you do not need spaces included, use "[^A-Za-z0-9]" as regular expression.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.