1

Approximate string matching is not a stranger problem.

I am learning and trying to understand how to solve it. I even now don't want to get too deep into it and just want to understand the brute-force way.

In its wiki page (Approximate string matching), it says

A brute-force approach would be to compute the edit distance to P (the pattern) for all substrings of T, and then choose the substring with the minimum distance. However, this algorithm would have the running time O(m * n^3), n is the length of T, m is the length of P

Ok. I understand this statement in the following way:

  1. We find out all possible substrings of T
  2. We compute the edit distance of each pair of strings {P, t1}, {P, t2}, ...
  3. We find out which substring has the shortest distance from P and this substring is the answer.

I have the following question:

a. I can use two for-loop to get all possible substrings and this requires O(n^2). So when I try to compute the edit distance of one substring and the patter, does it need O(n*m)? Why?

b. How exactly do I compute the distance of one pair (one substring and the patter)? I know I can insert, delete, substitute, but can anyone give me a algorithm that do just the calculation for one pair?

Thanks


Edit

Ok, I should use Levenshtein distance, but I don't quite understand its method.

Here is part of the code

for j from 1 to n
{
    for i from 1 to m
    {
      if s[i] = t[j] then  
        d[i, j] := d[i-1, j-1]       // no operation required
      else
        d[i, j] := minimum
                   (
                     d[i-1, j] + 1,  // a deletion
                     d[i, j-1] + 1,  // an insertion
                     d[i-1, j-1] + 1 // a substitution
                   )
    }
  }

So, assume I am now comparing {"suv", "svi"}.

So 'v' != 'i', then I have to see three other pairs:

  1. {"su", "sv"}
  2. {"suv", "sv"}
  3. {"su", "svi"}

How can I understand this part? Why I need to see these 3 parts?

Does the distance between two prefixes mean that we need distance number of changes in order to make the two prefixes (or strings) equal?

So, let's take a look at {"su", "sv"}. We can see that distance of {"su", "sv"} is 1. Then how can {"su", "sv"} become {"suv", "svi"} by just adding 1? I think we need to insert 'v' into "su" and 'v' into "sv" and then substitute the last 'i' with 'v', which has 3 operations involved, right?

4
  • The most common algorithm for word pair distance is en.wikipedia.org/wiki/Levenshtein_distance Commented May 28, 2012 at 22:52
  • @biziclop: I didn't see your comment in time - post it as an answer instead, and I'll delete mine. Commented May 28, 2012 at 22:54
  • @Aasmund Eldhuset Never mind :) Commented May 28, 2012 at 22:55
  • @biziclop could you please have a look at my new Edit part? Commented May 29, 2012 at 15:36

1 Answer 1

1

The standard way of measuring the edit distance between two strings is called Levenshtein distance - the wikipedia page contains pseudocode for the algorithm.

As for your edit: You need to look at {"su", "sv"} because it is possible that the best way to change "suv" into "svi" is to replace the last v by i, whose cost will come on top of the cost for changing "su" to "sv". Or, it could be that the best way is to change "suv" into "sv" somehow and then add an i. Or, it could be that the best way is to first delete the v from "suv" and then change "su" into "svi". The first way turns out to be best (or as good as the other options) in this case. The edit distance is indeed 2, and the operations are to change the u into a v and the v into an i.

Sign up to request clarification or add additional context in comments.

1 Comment

And from that pseudocode you can also see why it takes n*m steps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.