Finding a string multiple times in another String - Python

Question

I'm trying to see if a string exists in another string with out using Python's predefined functions such as find and index..

Right now what my function takes 2 strings as parameters, one is the string we are going to search while the other string is what we are looking for in the first string.

If the second string exists in the first I want my function to return all the positions it occurs in the first string.

Right now, my function is able to find the first occurrence and return an index, however I want to find multiple occurrences instead of just the first one.

Below is my code:

def multi_find (s, r):

    s_len = len(s)
    r_len = len(r)

    if s_len < r_len:
        n = -1
    else:
        m = s_len - r_len
        n = -1  # assume r is not yet found in s
        i = 0

        while n == -1 and i < m:
            # search for r in s until not enough characters are left
            if s[i:i + r_len] == r:
                n = i
            else:
                i = i + 1
    print (n)

multi_find("abcdefabc. asdli! ndsf acba saa abe?", "abc")

Right now, this will output just "0" because thats where abc occurs first.. How can I get it to return "0" and "6" (The beginning of the second occurrence), basically keep checking after it found one.

I was thinking of something like creating a list of all the places it occurs and then append i to that list but when I tried that, nothing was working for me.

Ashwini Chaudhary · Accepted Answer · 2013-11-07 17:49:39Z

9

You can do:

>>> haystack = "abcdefabc. asdli! ndsf acba saa abe?"
>>> needle = "abc"
>>> for i, _ in enumerate(haystack):
...     if haystack[i:i + len(needle)] == needle:
...         print (i)
...
0
6

edited Nov 7, 2013 at 17:49

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

answered Nov 7, 2013 at 17:37

Simeon Visser

123k19 gold badges192 silver badges185 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

l00kitsjake Over a year ago

Can you explain to me the "_" in the 'for i, _ in enumerate(haystack):' line? Not really sure what this does.

Simeon Visser Over a year ago

@JacobMammoliti: it means you're ignoring the variable that's there. enumerate() allows you to iterate over the positions and characters of the string but we're not using the characters. Hence, we're only iterating over each of the positions in the string. You can also write for i, c in enumerate(haystack): to iterate over each position i and each character c (at the same time) of the string.

neil Over a year ago

Note that the "_" is just a convention to make it clear to human readers - it doesn't act any differently than if you had used "x".

Ashwini Chaudhary · Accepted Answer · 2013-11-07 18:40:30Z

3

Another alternative using regex:

>>> import re
>>> haystack = "abcdefabc. asdli! ndsf acba saa abe?"
>>> needle = "abc"
>>> [m.start() for m in re.finditer(r'{}'.format(re.escape(needle)), haystack)]
[0, 6]

The above solution will not work for overlapping sub-strings, like there are 3 'aa' in 'aaaa'. So, if you want to find overlapping matches as well, then:

>>> haystack = "bobob"
>>> needle = "bob"
>>> [m.start() for m in re.finditer(r'(?={})'.format(re.escape(needle)), haystack)]
[0, 2]

edited Nov 7, 2013 at 18:40

answered Nov 7, 2013 at 17:47

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

2 Comments

Darren Stone Over a year ago

I love the use of re but since the goal is "to return all the positions it occurs in the first string", I think this fails to find some interesting cases where r occurs multiple times in s but its instances overlap. Consider the case: multi_find("bobob","bob"). Using your implementation, the string "bob" definitely occurs at position 2 in "bobob" but it is not returned. I love the one-liner but I thought I'd throw that in as a possible concern. It may not matter to @Jacob.

Ashwini Chaudhary Over a year ago

@DarrenStone Good point, added another solution that works for overlapping matches as well.

Siva Cn · Accepted Answer · 2013-11-07 17:43:46Z

1

def multi_find(s, r):

    s_len = len(s)
    r_len = len(r)

    _complete = []

    if s_len < r_len:
        n = -1
    else:

        for i in xrange(s_len):
            # search for r in s until not enough characters are left
            if s[i:i + r_len] == r:
                _complete.append(i)
            else:
                i = i + 1
    print(_complete)

multi_find("abcdefabc. asdli! ndsf abc saa abe?", "abc")

answered Nov 7, 2013 at 17:43

Siva Cn

9475 silver badges11 bronze badges

Comments

OGHaza · Accepted Answer · 2013-11-07 17:51:27Z

1

def multi_find (s, r):
    s_len = len(s)
    r_len = len(r)
    n = [] # assume r is not yet found in s

    if s_len >= r_len:
        m = s_len - r_len
        i = 0

        while i < m:
            # search for r in s until not enough characters are left
            if s[i:i + r_len] == r:
                n.append(i)
            i = i + 1
    print (n)

multi_find("abcdefabc. asdli! ndsf acba saa abe?", "abc")

Pretty much just replace n with a list so you can keep adding values to it as you find them. You also need to be incrementing i even when a match is found, it would have been stuck in a loop forever except that you had the while n == -1 constraint that made it stop as soon as a match was found.

answered Nov 7, 2013 at 17:51

OGHaza

4,7957 gold badges26 silver badges30 bronze badges

Comments

vitiral · Accepted Answer · 2013-11-07 18:00:55Z

1

probably the best way to do this is to keep calling the find function (this is fastest too)

def multifind(string, value, start = 0, stop = None):
    values = []
    while True:
        found = string.find(value, start, stop)
        if found == -1:
            break
        values.append(found)
        start = found + 1
    return values

print multifind('hello abc abc', 'abc')

Output:

[6, 10]

answered Nov 7, 2013 at 18:00

vitiral

9,3968 gold badges35 silver badges46 bronze badges

1 Comment

joanis Over a year ago

I know it's late to comment, 8 years later, but the question was about doing this without calling pre-defined functions like find(), so this is not an answer to the question as asked.

Darren Stone · Accepted Answer · 2013-11-07 19:22:32Z

1

@Jacob, I hope you'll find this one very short yet still easy to understand.

def multi_find(s, r):
    return [pos for pos in range(len(s)) if s.startswith(r,pos)]

edited Nov 7, 2013 at 19:22

answered Nov 7, 2013 at 19:16

Darren Stone

2,08813 silver badges16 bronze badges

Comments

Darren Stone · Accepted Answer · 2013-11-07 19:30:47Z

0

Note: I think this answer here is still a good "teaching answer", I have submitted a better solution elsewhere in this thread, without recursion.

def multi_find(s, r, start=0):
    if start >= len(s): 
        return []
    if s.startswith(r, start):
        return [start] + multi_find(s, r, start+1)
    else:
        return multi_find(s, r, start+1)

This allows you to pass an optional start position to begin the search in s.

This solution is recursive, which may or may not be the fastest implementation, but it is correct and I believe it makes the code easy to identify each of the three possibilities at each position of s:

end of s
found another r
didn't find another r

edited Nov 7, 2013 at 19:30

answered Nov 7, 2013 at 18:10

Darren Stone

2,08813 silver badges16 bronze badges

Collectives™ on Stack Overflow

Finding a string multiple times in another String - Python

7 Answers 7

3 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related