Improving Python function to count occurrences of a substring

Question

I'm a relatively inexperienced programmer (but pretty experienced in general) and am looking to improve my Python skills (my language of choice). I've written some useful tools with Python but really want to take my programming/scripting to the next level. I understand the logic but lack familiarity with much of the library. I've been practicing simple programming tasks in Python, and my most recent practice example is a function that takes a string and a substring and outputs the number of occurrences of the substring within the string:

from re import match

def MyFunc(string, substring):
    n = len(substring)
    substring_count = 0
    x = 0
    for char in string:
        if match(substring, string[x:x+n]):
            substring_count = substring_count + 1
        x = x + 1
    return substring_count

Is this an efficient way of doing this? Is my code particularly Pythonish? I also tried another solution without using regex but wasn't nearly as successful.

This code can definitely be improved, but I'm not sure that StackOverflow is the right site for this sort of question... — mgilson
– mgilson, Commented Feb 25, 2015 at 16:23
I know Vivek Sable has answered the question, but here are some tips for general use: Regex is terrible on performance and readability, don't use it if you have a bulit-in alternative - try x += 1 instead of x = x + 1. — no_name
– no_name, Commented Feb 25, 2015 at 16:25

Vivek Sable · Accepted Answer · 2015-02-25 16:28:24Z

2

Use string count method to get count of substing in main content.

Description:

string.count(s, sub[, start[, end]])

Return the number of (non-overlapping) occurrences of substring sub in string s[start:end]. Defaults for start and end and interpretation of negative values are the same as for slices.

e.g.

>>> a = "aabbbffgghhtt"
>>> a.count("ab")
1
>>> a.count("b")
3
>>> a.count("x")
0
>>>

edited Feb 25, 2015 at 16:28

answered Feb 25, 2015 at 16:22

Vivek Sable

10.3k6 gold badges45 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DSM Over a year ago

This is not the same as the OP's code, because it treats overlaps differently. For example, MyFunc("aaaa","aa") == 3, but "aaaa".count("aa") == 2.

LegendaryDude Over a year ago

That was my goal -- I wanted to catch the overlaps, not just distinct instances of each.

Ben Morris Over a year ago

@devOpsEv then juniorcompressor's answer is the best one

JuniorCompressor · Accepted Answer · 2015-02-25 16:51:50Z

1

Using regular expressions for non overlapping searches:

import re

def MyFunc(s, sub):
    return len(re.compile(re.escape(sub)).findall(s))

For overlapping:

def MyFunc(s, sub):
    n, m = len(sub), len(s)
    return sum(sub == s[i:i + n] for i in range(m - n + 1))

The problem you want to solve is what Knuth Morris Pratt algorithm accomplishes more efficiently.

edited Feb 25, 2015 at 16:51

answered Feb 25, 2015 at 16:25

JuniorCompressor

20.1k4 gold badges33 silver badges58 bronze badges

Comments

Ben Morris · Accepted Answer · 2015-02-25 16:27:58Z

0

If you want to use your own function using only built in python functions, use this:

def MyFunc(string, substring):
    return string.count(substring)

answered Feb 25, 2015 at 16:27

Ben Morris

6265 silver badges24 bronze badges

1 Comment

LegendaryDude Over a year ago

As stated in comments to other answers, count behaves differently as it does not catch the overlapping substrings.

Collectives™ on Stack Overflow

Improving Python function to count occurrences of a substring

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related