1

I have implemented the below Naive String search algorithm ('find'). It's as simple as it can get. However I found another way on 'GeeksforGeeks', ('search') it looked to have a better complexity. When I tested it for large strings, the results are drastically different but opposite.

1st: Slice the string into pattern length and compare. Move forward one character slice and compare. What should be the complexity of this?

def find(pat, txt):
    size = len(pat)
    for i in range( len(txt) -size + 1 ):
        if txt[i : i + size] == pat:
            print 'Pattern found at index %s'%(i)

2nd: Compare character by character. If a character doesn't match break. Else continue. In the end if all the characters matched print the result. Move forward one character. What should be the complexity of this?

def search(pat, txt):
    M = len(pat)
    N = len(txt)

    for i in xrange(N-M+1):
        status = 1 
        for j in xrange(M):
            if txt[i+j] != pat[j]:
                status = 0
                break
        if j == M-1 and status != 0:
            print "Pattern found at index " + str(i)

Timing test Cases:

testString = ''.join([ 'a' for _ in range(1000*100)] ) + 'b'
testPattern = ''.join([ 'a' for _ in range(100*100) ])  + 'b'

import cProfile
cProfile.run('find(testPattern, testString)')
cProfile.run('search(testPattern, testString)')

for find

Pattern found at index 90000
         90007 function calls in 0.160 seconds

For search

Pattern found at index 90000
         5 function calls in 135.951 seconds

In my algo find I do slicing and comparison. The time complexity for slicing is O(k), similarly for comparison it should take another O(k) though not sure. Python Time Complexity

While in search we run the loop just 'k' times. So shouldn't it have a better time complexity.

13
  • Whether algorithm A does better than algorithm B on a given test and whether or not algorithm A has better time complexity than algorithm B are two separate questions. For one thing -- complexity results are asymptotic. For another thing, the constant of proportionality in big-O notation can hide a multitude of inefficiencies. I don't know what is happening here -- but you do seem to be comparing apples and oranges. Commented Dec 17, 2016 at 13:37
  • @JohnColeman but we can always talk of worst case scenario. I tried using different strings. But in general the 1st algo is faster, and that too there is drastic difference. I am trying to understand where am I getting wrong. Let's say just for this case, why are these results they way there are? Commented Dec 17, 2016 at 13:41
  • 1
    you can profile this and find out, look up the standard python profiling tools. I don't see a substantial difference between the two algorithms, your implementation, though, relies on native string comparison done by the runtime, the other implementation does it character by character in pure python. This is far slower. Commented Dec 17, 2016 at 13:44
  • In Python it is not uncommon for algorithmically "slower" approaches which better exploit built-in methods (running in optimized compiled C) perform better than algorithmically "faster" algorithms which are implemented in interpreted code. That is one of those inefficiencies which can be hidden in a constant of proportionality. Commented Dec 17, 2016 at 13:48
  • 1
    The second method is how native string comparison works in Python. If the first characters don't match, it doesn't bother to look at the rest. The difference is that it is implemented in a more optimized way. Commented Dec 17, 2016 at 13:51

2 Answers 2

5

Your two algorithms are essentially the same (as @Zah pointed out) with the only difference that the inner loop in the second algorithm is done by the underlying C code in the first algorithm. What you are observing is the difference between compiled and interpreted code.

If you want all indices and want to exploit the built-in methods:

def findAll(s,t):
    """returns all indices where substring t occurs in string s"""
    indices = []
    i = s.find(t)
    while i > -1:
        indices.append(i)
        i = s.find(t,i+1)
    return indices

For example,

>>> findAll("The cat in the hat","at")
[5, 16]
Sign up to request clarification or add additional context in comments.

3 Comments

So I am better using native methods in 'competitive programming` rather than trying to code my own string searches?
cool :) this is much faster than my implementation. Python native find uses Bayers Moore with bells and whistles on top. So I think I am better using this always.
@garg10may "competitive programming" in the sense of programming contests will presumably have rules about what is and is not permissible, rules which probably differ between various contests. If all you care about is efficiency -- Python code which uses built-in methods is faster than logically equivalent code that just uses loops and indices.
0

I have implemented naive search code in PYTHON in simple understanding as below. It will return no of time pattern got found.

def naive_pattern_search(data,search):
n = len(data) #Finding length of data
m = len(search) #Finding length of pattern to be searched.

i = 0
count = c = 0 #Taking for counting pattern if exixts.
for j in range(m-1):#Loop continue till length of pattern to be Search.
    while i <= (n-1):#Data loop
        
        #if searched patten length reached highest index at that time again initilize with 0.
        if j > (m-1):
            j = 0
        
        #Data and search have same element then both Index increment by 1.
        if data[i]==search[j]:
            #print(f"\n{ data[i] } { search[j] }")
            #print(f"i : {i}  {data[i]}   j : {j}  {search[j]}")
            i+=1
            j+=1
            count+=1
            
            #If one pattern compared and found Successfully then Its Counter for pattern.
            if count== (m-1):
                c = c + 1
        #Initilise pattern again with 0 for searching with next element in data.
        else:
            j = 0 #Direct move to 0th index.
            i+=1
            count=0 #If data not found as per pattern continuously then it will start counting from 0 again.

#Searched pattern occurs more then 0 then its Simply means that pattern found.
if c > 0:
    return c;
else:
    return -1;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.