python str.index time complexity

Question

For finding the position of a substring, inside a string, a naive algorithm will take O(n^2) time. However, using some efficient algorithms (eg KMP algorithm), this can be achieved in O(n) time:

s = 'saurabh'
w = 'au'

def get_table():
    i = 0; j = 2 
    t = []
    t.append(-1); t.append(0)
    while i < len(w):
        if w[i] == w[j-1]:
            t.append(j+1)
            j += 1
        else:
            t.append(0)
            j = 0 
        i += 1
    return t

def get_word():
    t = get_table()
    i = j = 0 
    while i+j < len(s):
        if w[j] == s[i+j]:
            if j == len(w) - 1:
                return i
            j += 1
        else:
            if t[j] > -1: 
                i = i + j - t[j]
                j = t[j]
            else:
                i += 1
    return -1

if __name__ == '__main__':
    print get_word()

However, if we do: 'saurabh'.index('ra'), does it internally uses some efficient algorithm to compute this in O(n) or it uses naive algorithm of complexity O(n^2) ?

You could profile it and see if the time grows exponentially or linearly ;) — Joel Cornett
– Joel Cornett, Commented Apr 26, 2016 at 14:52

alpert · Accepted Answer · 2016-04-26 15:07:21Z

4

In that article [1] author goes through the algoritm and explains it. From article:

The function “fastsearch” is called. It is a mix between 
Boyer-Moore and Horspool algorithms plus couple of neat tricks.

And from the wiki page of Boyer–Moore–Horspool algorithm [2]:

The algorithm trades space for time in order to obtain an 
average-case complexity of O(N) on random text, although 
it has O(MN) in the worst case, where the length of the 
pattern is M and the length of the search string is N.

Hope that helps!

[1] http://www.laurentluce.com/posts/python-string-objects-implementation

[2] https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm

answered Apr 26, 2016 at 15:07

alpert

4,6751 gold badge21 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Saurabh Verma Over a year ago

But KMP's worst case time is still linear. Does this mean we should implement our code using KMP algorithm instead of python's builtin index() for time critical processes ?

alpert Over a year ago

I think that thread has a good answer about that topic: programmers.stackexchange.com/questions/183725/…

Community · Accepted Answer · 2017-05-23 11:53:11Z

0

its a combination of a few algorithms- look at this

Python string 'in' operator implementation algorithm and time complexity

or this

http://effbot.org/zone/stringlib.htm

edited May 23, 2017 at 11:53

CommunityBot

11 silver badge

answered Apr 26, 2016 at 15:03

Jonathan

4964 silver badges13 bronze badges

Comments

Francesco · Accepted Answer · 2016-04-26 19:26:04Z

-1

Sometimes you can get a quick answer just by trying

>>> timeit.timeit('x.index("ra")', setup='x="a"*100+"ra"')
0.4658635418727499
>>> timeit.timeit('x.index("ra")', setup='x="a"*200+"ra"')
0.7199222409243475
>>> timeit.timeit('x.index("ra")', setup='x="a"*300+"ra"')
0.9555441829046458
>>> timeit.timeit('x.index("ra")', setup='x="a"*400+"ra"')
1.1994099491303132
>>> timeit.timeit('x.index("ra")', setup='x="a"*500+"ra"')
1.4850994427915793

answered Apr 26, 2016 at 19:26

Francesco

4,3002 gold badges25 silver badges31 bronze badges

Collectives™ on Stack Overflow

python str.index time complexity

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related