native string matching algorithm

Question

Following is a very famous question in native string matching. Please can someone explain me the answer.

Suppose that all characters in the pattern P are different. Show how to accelerate NAIVE-STRING MATCHER to run in time O(n) on an n-character text T.

Bernhard Barker · Accepted Answer · 2013-07-04 23:17:39Z

The basic idea:

Iterate through the input and the pattern at the same time, comparing their characters to each other
Whenever you get a non-matching character between the two, you can just reset the pattern position and keep the input position as is

This works because the pattern characters are all different, which means that whenever you have a partial match, there can be no other match overlapping with that, so we can just start looking from the end of the partial match.

Here's some pseudo-code that shouldn't be too difficult to understand:

input[n]
pattern[k]
pPos = 0
iPos = 0
while iPos < n
  if pPos == k
    FOUND!
  if pattern[pPos] == input[iPos]
    pPos++
    iPos++
  else
    // if pPos is already 0, we need to increase iPos,
    //   otherwise we just keep comparing the same characters
    if pPos == 0
      iPos++
    pPos = 0

It's easy to see that iPos increases at least every second loop, thus there can be at most 2n loop runs, making the running time O(n).

Krishey · Accepted Answer · 2013-10-08 01:10:22Z

1

When T[i] and P[j] mismatches in NAIVE-STRING-MATCHER, we can skip all characters before T[i] and begin new matching from T[i + 1] with P[1].

NAIVE-STRING-MATCHER(T, P)

1 n length[T]

2 m length[P]

3 for s 0 to n - m

4 do if P[1 . . m] = T[s + 1 . . s + m]

5 then print "Pattern occurs with shift" s

edited Oct 8, 2013 at 1:10

answered Oct 8, 2013 at 1:04

Krishey

112 bronze badges

Comments

Majid Zandi · Accepted Answer · 2015-06-22 06:02:58Z

Naive string search algorithm implementations in Python 2.7: https://gist.github.com/heyhuyen/4341692

In the middle of implementing Boyer-Moore's string search algorithm, I decided to play with my original naive search algorithm. It's implemented as an instance method that takes a string to be searched. The object has an attribute 'pattern' which is the pattern to match.

1) Here is the original version of the search method, using a double for-loop. Makes calls to range and len

def search(self, string):
    for i in range(len(string)):
        for j in range(len(self.pattern)):
            if string[i+j] != self.pattern[j]:
                break
            elif j == len(self.pattern) - 1:
                return i
    return -1

2) Here is the second version, using a double while-loop instead. Slightly faster, not making calls to range

def search(self, string):
    i = 0
    while i < len(string):
        j = 0
        while j < len(self.pattern) and self.pattern[j] == string[i+j]:
            j += 1
        if j == len(self.pattern):
            return i
        i += 1
    return -1

3) Here is the original, replacing range with xrange. Faster than both of the previous two.

def search(self, string):
    for i in xrange(len(string)):
        for j in xrange(len(self.pattern)):
            if string[i+j] != self.pattern[j]:
                break
            elif j == len(self.pattern) - 1:
                return i
    return -1

4) Storing values in local variables = win! With the double while loop, this is the fastest.

def search(self, string):
    len_pat = len(self.pattern)
    len_str = len(string)
    i = 0
    while i < len_str:
        j = 0
        while j < len_pat and self.pattern[j] == string[i+j]:
            j += 1
        if j == len_pat:
            return i
        i += 1
    return -1

Collectives™ on Stack Overflow

native string matching algorithm

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related