0

I have a small plugin to convert unicode string to non-unicode string. It looks like:

public static class StringHelper
{
    public static string ToNonUnicode(this string source)
    {
        if (!string.IsNullOrEmpty(source))
        {
            source = source.Trim().Replace(".", "");

            #region rule
            IDictionary<string, string> dict = new Dictionary<string, string>
            {
                { @"\-|\,", "" },
                { @"\s{2}", " " },

                { "à|á|ả|ã|ạ|ă|â|ấ|ầ|ẩ|ẫ|ậ|ằ|ẳ|ắ|ẵ|ặ", "a" },
                { "á|à|ả|ã|ạ|â|ă|ấ|ầ|ẩ|ẫ|ậ|ắ|ằ|ẳ|ẵ|ặ", "a" },

                { "À|Á|Ả|Ã|Ạ|Ă|Â|Ầ|Ấ|Ẩ|Ẫ|Ậ|Ằ|Ắ|Ẳ|Ẵ|Ặ", "A" },
                { "Á|À|Ả|Ã|Ạ|Â|Ă|Ấ|Ầ|Ẩ|Ẫ|Ậ|Ắ|Ằ|Ẳ|Ẵ|Ặ", "A" },

                { "ò|ó|ỏ|õ|ọ|ô|ơ|ồ|ố|ổ|ỗ|ộ|ờ|ớ|ở|ỡ|ợ", "o" },
                { "ó|ò|ỏ|õ|ọ|ô|ơ|ố|ồ|ổ|ỗ|ộ|ớ|ờ|ở|ỡ|ợ", "o" },

                { "Ò|Ó|Ỏ|Õ|Ọ|Ô|Ơ|Ồ|Ố|Ổ|Ỗ|Ộ|Ờ|Ớ|Ở|Ỡ|Ợ", "O" },
                { "Ó|Ò|Ỏ|Õ|Ọ|Ô|Ơ|Ố|Ồ|Ổ|Ỗ|Ộ|Ớ|Ờ|Ở|Ỡ|Ợ", "O" },

                { "è|é|ẻ|ẽ|ẹ|ê|ề|ế|ể|ễ|ệ", "e" },
                { "é|è|ẻ|ẽ|ẹ|ê|ế|ề|ể|ễ|ệ", "e" },

                { "È|É|Ẻ|Ẽ|Ẹ|Ê|Ề|Ế|Ể|Ễ|Ệ", "E" },
                { "É|È|Ẻ|Ẽ|Ẹ|Ê|Ế|Ề|Ể|Ễ|Ệ", "E" },

                { "ù|ú|ủ|ũ|ụ|ư|ừ|ứ|ử|ữ|ự", "u" },
                { "ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự", "u" },

                { "Ù|Ú|Ủ|Ũ|Ụ|Ư|Ừ|Ứ|Ử|Ữ|Ự", "U" },
                { "Ú|Ù|Ủ|Ũ|Ụ|Ư|Ứ|Ừ|Ử|Ữ|Ự", "U" },

                { "ì|í|ỉ|ĩ|ị|í|ì|ỉ|ĩ|ị", "i" },
                { "Ì|Í|Ỉ|Ĩ|Ị|Í|Ì|Ỉ|Ĩ|Ị", "I" },

                { "ỳ|ý|ỷ|ỹ|ỵ|ý|ỳ|ỷ|ỹ|ỵ", "y" },
                { "Ỳ|Ý|Ỷ|Ỹ|Ỵ|Ý|Ỳ|Ỷ|Ỹ|Ỵ", "Y" },

                { "đ", "d" }, { "Đ", "D" }
            };
            #endregion

            foreach (var d in dict)
            {
                var matches = Regex.Matches(source, d.Key);
                foreach (Match match in matches)
                {
                    source = Regex.Replace(source, match.Value, d.Value);
                }
            }                
        }            
        return source;
    }
}

Test:

string str = "Làm người yêu em nhé baby...";
string res = str.ToNonUnicode(); // "Lam nguoi yeu em nhe baby"

To achieve that, I have to use loop twice, one for matching, one for replacing. I'm looking for another way(s) to do that for writing code faster. Using LinQ is a way I think, but I don't know where I go.

Can you give me some tips? Thank you!

2
  • 3
    Wouldn't foreach(var d in dict) source = Regex.Replace(source, d.Key, d.Value) do the same thing without and extra Regex.Matches? From MSDN: If pattern is not matched in the current instance, the method returns the current instance unchanged. Commented Jan 13, 2017 at 22:27
  • @MarcinJuraszek Many thanks! My bad. Commented Jan 13, 2017 at 22:34

1 Answer 1

2

You don't need the Matches loop, just do it directly with the Regex.Replace

foreach (var d in dict)
{
    source = Regex.Replace(source, d.Key, d.Value);
}   
Sign up to request clarification or add additional context in comments.

1 Comment

Please accept the answer, if it works for you, it helps other users see that this problem is solved, and It gives me sweet, sweet rep :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.