2
def subStringMatchExact(target, key):

    if (target.find(key) == -1):
        return []
    else:
        foundStringAt = [target.find(key)]
        target = target[foundStringAt[0] + len(key):]
        return foundStringAt + subStringMatchExact(target, key)

string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co") 

print(string)

Current incorrect output:

[5, 22, 9, 19, 14]

I am having trouble summing the length of the substring on the previous recursion step. Like the second element of the list should be 29 instead of 22 as in len(previousSubstring) + len(key) - 1 + len(currentSubstring).

Any ideas to improve my code and/or fix my error too?

1
  • I guess, by the print(), that this is Python 3 right? Commented Dec 19, 2011 at 5:25

4 Answers 4

5

The fast way

You don't have to implement your own solution, its already done! Use the finditer function from the re module:

>>> import re
>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> matches = re.finditer('co', s)
>>> positions = [ match.start() for match in matches ]
>>> positions
[5, 29, 40, 61, 77]

Your own way

If you want to make your own implementation (using recursion) you could take advantage of the extra arguments of the str.find function. Lets see what help(str.find) says about it:

S.find(sub [,start [,end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within s[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

There is an extra argument called start that tells str.find where to start searching the substring. That's just what we need!

So, modifying your implementation, we can get a simple, fast and beautiful solution:

def substring_match_exact(pattern, string, where_should_I_start=0):
    # Save the result in a variable to avoid doing the same thing twice
    pos = string.find(pattern, where_should_I_start)
    if pos == -1:
        # Not found!
        return []
    # No need for an else statement
    return [pos] + substring_match_exact(pattern, string, pos + len(key))

What is the recursion doing here?

  • You're first searching the substring in the string starting at position 0.
  • If the substring wasn't found, an empty list is returned [].
  • If the substring was found, it will be returned [pos] plus all the positions where the substring will appear in the string starting at position pos + len(key).

Using our brand new function

>>> s = 'your code works with wrongly correlated coefficients which incorporates more costs'
>>> substring_match_exact('co', s)
[5, 29, 40, 61, 77]
Sign up to request clarification or add additional context in comments.

4 Comments

Oh, regular expressions. Well know I know how to do it efficiently, but still any input on my code? I am doing the MIT 6.00 OCW and just read about recursion. That is why I am focused on doing it by recursion. I know the theory behind the topic, but I want to get this "harder" practical problem done. Factorials and Fibonacci were too easy to make sure I grasp this concept correctly.
@cRaZiRiCaN: recursion is indeed a very interesting concept. Take a look at the other answers to fully understand what was the problem with your solution.
Every basic programming problem can be regarded as being a Homework problem so I understand your reluctance to provide a "solution" (just the mention of an offset approach would of have sufficed too, for example). But thank you very much for the input on the re module. Did not know about that. EDIT: This was before I saw your edit, thanks again.
Nice answer! Just one small typo matches = re.finditer(s, 'co') should be matches = re.finditer('co', s) instead ;)
4

Currently, your code is attempting to find the index of co in the shortened string, rather than the original string. Therefore, while [5, 22, 9, 19, 14] may seem incorrect, the script is doing exactly what you told it to do. By including an offset, like the script below, this code could work.

def subStringMatchExact(target, key, offset=0): # note the addition of offset

    if (target.find(key) == -1):
        return []
    else:
        foundStringAt = target.find(key)
        target = target[foundStringAt + len(key):]
        foundStringAt += offset # added
        return [foundStringAt] + subStringMatchExact(target, key, foundStringAt + len(key))
        # added foundStringAt + len(key) part

string = subStringMatchExact("your code works with wrongly correlated coefficients which incorporates more costs", "co") 
# no need to call w/ 0 since offset defaults to 0 if no offset is given

print(string)

I should add that making foundStringAt a list from the beginning isn't great practice when dealing with only one value, as you add some overhead with every [0] index lookup. Instead, since you want a list return type, you should just enclose it in [] in the return statement (as shown in my code).

1 Comment

"overhead" isn't the issue; complexity is.
2

You are always adding the position in the respective substring. In

return foundStringAt + subStringMatchExact(target, key)

, the result of the function call is related to the "new" string target which is different from the "Old" one, as it was redefined with target = target[foundStringAt[0] + len(key):].

So you should add exactly this value to the function call results:

    foundStringAt = target.find(key)
    offset = foundStringAt + len(key)
    target = target[offset:]
    return [foundStringAt] + [i + offset for i in subStringMatchExact(target, key)]

should do the trick (untested).

Comments

0

I wouldn't bother using recursion for this, except as an exercise.

To fix the problem:

I am having trouble summing the length of the substring on the previous recursion step.

What you really want to "sum" is the amount of the string that has already been searched. Pass this to the function as a parameter (use 0 for the first call), adding the amount of string removed (foundStringAt[0] + len(key):, currently) to the input value for the recursive call.

As a matter of formatting (and to make things correspond better to their names), you'll probably find it neater to let foundStringAt store the result directly (instead of that 1-element list), and do the list wrapping as part of the expression with the recursive call.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.