2

Following is a very famous question in native string matching. Please can someone explain me the answer.

Suppose that all characters in the pattern P are different. Show how to accelerate NAIVE-STRING MATCHER to run in time O(n) on an n-character text T.

3 Answers 3

8

The basic idea:

  • Iterate through the input and the pattern at the same time, comparing their characters to each other
  • Whenever you get a non-matching character between the two, you can just reset the pattern position and keep the input position as is

This works because the pattern characters are all different, which means that whenever you have a partial match, there can be no other match overlapping with that, so we can just start looking from the end of the partial match.

Here's some pseudo-code that shouldn't be too difficult to understand:

input[n]
pattern[k]
pPos = 0
iPos = 0
while iPos < n
  if pPos == k
    FOUND!
  if pattern[pPos] == input[iPos]
    pPos++
    iPos++
  else
    // if pPos is already 0, we need to increase iPos,
    //   otherwise we just keep comparing the same characters
    if pPos == 0
      iPos++
    pPos = 0

It's easy to see that iPos increases at least every second loop, thus there can be at most 2n loop runs, making the running time O(n).

Sign up to request clarification or add additional context in comments.

Comments

1

When T[i] and P[j] mismatches in NAIVE-STRING-MATCHER, we can skip all characters before T[i] and begin new matching from T[i + 1] with P[1].

NAIVE-STRING-MATCHER(T, P)

1 n length[T]

2 m length[P]

3 for s 0 to n - m

4 do if P[1 . . m] = T[s + 1 . . s + m]

5 then print "Pattern occurs with shift" s

Comments

0

Naive string search algorithm implementations in Python 2.7: https://gist.github.com/heyhuyen/4341692

In the middle of implementing Boyer-Moore's string search algorithm, I decided to play with my original naive search algorithm. It's implemented as an instance method that takes a string to be searched. The object has an attribute 'pattern' which is the pattern to match.

1) Here is the original version of the search method, using a double for-loop. Makes calls to range and len

def search(self, string):
    for i in range(len(string)):
        for j in range(len(self.pattern)):
            if string[i+j] != self.pattern[j]:
                break
            elif j == len(self.pattern) - 1:
                return i
    return -1

2) Here is the second version, using a double while-loop instead. Slightly faster, not making calls to range

def search(self, string):
    i = 0
    while i < len(string):
        j = 0
        while j < len(self.pattern) and self.pattern[j] == string[i+j]:
            j += 1
        if j == len(self.pattern):
            return i
        i += 1
    return -1

3) Here is the original, replacing range with xrange. Faster than both of the previous two.

def search(self, string):
    for i in xrange(len(string)):
        for j in xrange(len(self.pattern)):
            if string[i+j] != self.pattern[j]:
                break
            elif j == len(self.pattern) - 1:
                return i
    return -1

4) Storing values in local variables = win! With the double while loop, this is the fastest.

def search(self, string):
    len_pat = len(self.pattern)
    len_str = len(string)
    i = 0
    while i < len_str:
        j = 0
        while j < len_pat and self.pattern[j] == string[i+j]:
            j += 1
        if j == len_pat:
            return i
        i += 1
    return -1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.