6

I am using two strings for a matching program like this:

string s1= 5-4-6-+1-+1+1+3000+12+21-+1-+1-+1-+2-3-4-5-+1-+10+1-+1-+;
string s2= 6-+1-+1+1+3000+12+21-+1-+1-+1-+1-+1-+1+1-+1-+;

And I am going to write a Regex matching function which compares each part string between each "+" separately and calculates the match percent, which is the number of matches occurring in each string. For example in this example we have these matches:

6

1

1

1

3000

12

21

1

1

1

--

1

--

1

1

In this example the match percent is 13*100/15=87%.

Currently I am using the function below, but I think it is not optimized and using Regex may be faster.

public double MatchPercent(string s1, string s2) {
    int percent=0;
    User = s1.Split('+').ToArray();
    Policy = s2.Split('+').ToArray();

    for (int i = 0; i < s1.Length - 2; i++) {
        int[] U = User[i].Split('-').Where(a => a != "").Select(n => 
                      Convert.ToInt32(n)).Distinct().ToArray();
        int[] P = Policy[i].Split('-').Where(a => a != "").Select(n => 
                      Convert.ToInt32(n)).Distinct().ToArray();
        var Co = U.Intersect(P);
        if (Co.Count() > 0) {
            percent += 1;
        }
    }
    return Math.Round((percent) * 100 / s1.Length );
}
12
  • 1
    I don't understand what do you want to do. In your for loop, you don't use iterator value. So you always should get 98% of match or 0% of match. Commented Jun 8, 2013 at 9:29
  • 6
    I don't think regular expressions will work. Specifically, I don't think you can maintain state (i.e. the sameness count) over a regex this way. And calculating this after the match would require a variable number of capture groups. Commented Jun 8, 2013 at 16:05
  • This function first splits two strings separated by "+" and find match numbers in each part. @KirillBestemyanov I edited the function again, it was my typing mistake. Commented Jun 8, 2013 at 17:02
  • 9
    This is essentially an alignment problem. You need a suitable sequence alignment algorithm here, not regular expressions. Commented Jun 8, 2013 at 18:28
  • 3
    Konrad's right; instead of making your job easier, switching to a regex solution will make it much more difficult, if not impossible. Commented Jun 8, 2013 at 19:00

1 Answer 1

2

A better solution would be Levenshtein Word Distance algorithm. Some C# samples:

From the matching characters you can also calculate the percentages.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.