Displaying multiple substring indices within a string by using recursion

Question

def subStringMatchExact(target, key):

    if (target.find(key) == -1):
        return []
    else:
        foundStringAt = [target.find(key)]
        target = target[foundStringAt[0] + len(key):]
        return foundStringAt + subStringMatchExact(target, key)

string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co") 

print(string)

Current incorrect output:

[5, 22, 9, 19, 14]

I am having trouble summing the length of the substring on the previous recursion step. Like the second element of the list should be 29 instead of 22 as in len(previousSubstring) + len(key) - 1 + len(currentSubstring).

Any ideas to improve my code and/or fix my error too?

I guess, by the print(), that this is Python 3 right?

juliomalegria
– juliomalegria

2011-12-19 05:25:13 +00:00
Commented Dec 19, 2011 at 5:25 — juliomalegria
– juliomalegria, Commented Dec 19, 2011 at 5:25

juliomalegria · Accepted Answer · 2012-07-26 16:14:28Z

5

The fast way

You don't have to implement your own solution, its already done! Use the finditer function from the re module:

>>> import re
>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> matches = re.finditer('co', s)
>>> positions = [ match.start() for match in matches ]
>>> positions
[5, 29, 40, 61, 77]

Your own way

If you want to make your own implementation (using recursion) you could take advantage of the extra arguments of the str.find function. Lets see what help(str.find) says about it:

S.find(sub [,start [,end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within s[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

There is an extra argument called start that tells str.find where to start searching the substring. That's just what we need!

So, modifying your implementation, we can get a simple, fast and beautiful solution:

def substring_match_exact(pattern, string, where_should_I_start=0):
    # Save the result in a variable to avoid doing the same thing twice
    pos = string.find(pattern, where_should_I_start)
    if pos == -1:
        # Not found!
        return []
    # No need for an else statement
    return [pos] + substring_match_exact(pattern, string, pos + len(key))

What is the recursion doing here?

You're first searching the substring in the string starting at position 0.
If the substring wasn't found, an empty list is returned [].
If the substring was found, it will be returned [pos] plus all the positions where the substring will appear in the string starting at position pos + len(key).

Using our brand new function

>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> substring_match_exact('co', s)
[5, 29, 40, 61, 77]

edited Jul 26, 2012 at 16:14

answered Dec 19, 2011 at 5:17

juliomalegria

25k14 gold badges77 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user810606 Over a year ago

Oh, regular expressions. Well know I know how to do it efficiently, but still any input on my code? I am doing the MIT 6.00 OCW and just read about recursion. That is why I am focused on doing it by recursion. I know the theory behind the topic, but I want to get this "harder" practical problem done. Factorials and Fibonacci were too easy to make sure I grasp this concept correctly.

juliomalegria Over a year ago

@cRaZiRiCaN: recursion is indeed a very interesting concept. Take a look at the other answers to fully understand what was the problem with your solution.

user810606 Over a year ago

Every basic programming problem can be regarded as being a Homework problem so I understand your reluctance to provide a "solution" (just the mention of an offset approach would of have sufficed too, for example). But thank you very much for the input on the re module. Did not know about that. EDIT: This was before I saw your edit, thanks again.

Tuxdude Over a year ago

Nice answer! Just one small typo matches = re.finditer(s, 'co') should be matches = re.finditer('co', s) instead ;)

Edwin · Accepted Answer · 2011-12-19 05:30:09Z

4

Currently, your code is attempting to find the index of co in the shortened string, rather than the original string. Therefore, while [5, 22, 9, 19, 14] may seem incorrect, the script is doing exactly what you told it to do. By including an offset, like the script below, this code could work.

def subStringMatchExact(target, key, offset=0): # note the addition of offset

    if (target.find(key) == -1):
        return []
    else:
        foundStringAt = target.find(key)
        target = target[foundStringAt + len(key):]
        foundStringAt += offset # added
        return [foundStringAt] + subStringMatchExact(target, key, foundStringAt + len(key))
        # added foundStringAt + len(key) part

string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co") 
# no need to call w/ 0 since offset defaults to 0 if no offset is given

print(string)

I should add that making foundStringAt a list from the beginning isn't great practice when dealing with only one value, as you add some overhead with every [0] index lookup. Instead, since you want a list return type, you should just enclose it in [] in the return statement (as shown in my code).

edited Dec 19, 2011 at 5:30

answered Dec 19, 2011 at 5:17

Edwin

2,1141 gold badge21 silver badges41 bronze badges

1 Comment

Karl Knechtel Over a year ago

"overhead" isn't the issue; complexity is.

glglgl · Accepted Answer · 2011-12-19 05:28:08Z

2

You are always adding the position in the respective substring. In

return foundStringAt + subStringMatchExact(target, key)

, the result of the function call is related to the "new" string target which is different from the "Old" one, as it was redefined with target = target[foundStringAt[0] + len(key):].

So you should add exactly this value to the function call results:

    foundStringAt = target.find(key)
    offset = foundStringAt + len(key)
    target = target[offset:]
    return [foundStringAt] + [i + offset for i in subStringMatchExact(target, key)]

should do the trick (untested).

answered Dec 19, 2011 at 5:28

glglgl

91.5k13 gold badges157 silver badges230 bronze badges

Comments

Karl Knechtel · Accepted Answer · 2011-12-19 05:33:57Z

I wouldn't bother using recursion for this, except as an exercise.

To fix the problem:

I am having trouble summing the length of the substring on the previous recursion step.

What you really want to "sum" is the amount of the string that has already been searched. Pass this to the function as a parameter (use 0 for the first call), adding the amount of string removed (foundStringAt[0] + len(key):, currently) to the input value for the recursive call.

As a matter of formatting (and to make things correspond better to their names), you'll probably find it neater to let foundStringAt store the result directly (instead of that 1-element list), and do the list wrapping as part of the expression with the recursive call.

Collectives™ on Stack Overflow

Displaying multiple substring indices within a string by using recursion

4 Answers 4

The fast way

Your own way

What is the recursion doing here?

Using our brand new function

4 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

The fast way

Your own way

What is the recursion doing here?

Using our brand new function

4 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related