16

I have to find the count of a substring in a string using the C language. I'm using the function strstr but it only finds the first occurrence.

My idea of the algorithm is something like searching in the string while strstr does not return null and to substring the main string on each loop. My question is how to do that?

6 Answers 6

41

You could do something like

int countString(const char *haystack, const char *needle){
    int count = 0;
    const char *tmp = haystack;
    while(tmp = strstr(tmp, needle))
    {
        count++;
        tmp++;
    }
    return count;
}

That is, when you get a result, start searching again at the next position of the string.

strstr() doesn't only work starting from the beginning of a string but from any position.

Sign up to request clarification or add additional context in comments.

5 Comments

If they need to be distinct substrings, you might consider count+=strlen(string2find)
Edit, I added protection against problems in case of string2find=""
@Dave, beware of infinite loop for ""
@Dave and future readers, I believe you meant tmp += strlen(string2find). In your example, you are incrementing the number of occurrences by the length of the string!
if you are finding "zz" in "zzzz" it would return 3 and (with tmp++) I believe this is correct answer, if you do something like tmp += strlen(string2find) this would just return with 2.
6

Should already processed parts of the string should be consumed or not?

For example, what's the expect answer for case of searching oo in foooo, 2 or 3?

  • If the latter (we allow substring overlapping, and the answer is three), then Joachim Isaksson suggested the right code.

  • If we search for distinct substrings (the answer should be two), then see the code below (and online example here):

    char *str = "This is a simple string";
    char *what = "is";
    
    int what_len = strlen(what);
    int count = 0;
    
    char *where = str;
    
    if (what_len) 
        while ((where = strstr(where, what))) {
            where += what_len;
            count++;
        }
    

Comments

6

USE KMP and you can do it in O(n)

int fail[LEN+1];
char s[LEN];
void getfail()
{
    //f[i+1]= max({j|s[i-j+1,i]=s[0,j-1],j!=i+1})
    //the correctness can be proved by induction
    for(int i=0,j=fail[0]=-1;s[i];i++)
    {
        while(j>=0&&s[j]!=s[i]) j=fail[j];
        fail[i+1]=++j;
        if (s[i+1]==s[fail[i+1]]) fail[i+1]=fail[fail[i+1]];//optimizing fail[]
    }
}

int kmp(char *t)// String s is pattern and String t is text!
{
    int cnt=0;
    for(int i=0,j=0;t.s[i];i++)
    {
        while(j>=0&&t.s[i]!=s[j]) j=fail[j];
        if (!s[++j])
        {
            j=fail[j];
            cnt++;
        }
    }
    return cnt;// how many times s appeared in t.
}

Comments

2

The results can be different depending whether you allow an overlap or not:

// gcc -std=c99
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

static int
count_substr(const char *str, const char* substr, bool overlap) {
  if (strlen(substr) == 0) return -1; // forbid empty substr

  int count = 0;
  int increment = overlap ? 1 : strlen(substr);
  for (char* s = (char*)str; (s = strstr(s, substr)); s += increment)
    ++count;
  return count;
}

int main() {
  char *substrs[] = {"a", "aa", "aaa", "b", "", NULL };
  for (char** s = substrs; *s != NULL; ++s)
    printf("'%s' ->  %d, no overlap: %d\n", *s, count_substr("aaaaa", *s, true),
       count_substr("aaaaa", *s, false));
}

Output

'a' ->  5, no overlap: 5
'aa' ->  4, no overlap: 2
'aaa' ->  3, no overlap: 1
'b' ->  0, no overlap: 0
'' ->  -1, no overlap: -1

Comments

0

Assuming s and substr are non-null and non-empty:

/* #times substr appears in s, no overlaps */
int nappear(const char *s, const char *substr)
{
    int n = 0;
    const char *p = s;

    size_t lenSubstr = strlen(substr);

    while (*p) {
        if (memcmp(p, substr, lenSubstr) == 0) {
            ++n;
            p += lenSubstr;
        } else 
            ++p;
    }
    return n;
}

Comments

-2
/* 
 * C Program To Count the Occurence of a Substring in String 
 */
#include <stdio.h>
#include <string.h>

char str[100], sub[100];
int count = 0, count1 = 0;

void main()
{
    int i, j, l, l1, l2;

    printf("\nEnter a string : ");
    scanf("%[^\n]s", str);

    l1 = strlen(str);

    printf("\nEnter a substring : ");
    scanf(" %[^\n]s", sub);

    l2 = strlen(sub);

    for (i = 0; i < l1;)
    {
        j = 0;
        count = 0;
        while ((str[i] == sub[j]))
        {
            count++;
            i++;
            j++;
        }
        if (count == l2)
        {
            count1++;                                   
            count = 0;
        }
        else
            i++;
    }    
    printf("%s occurs %d times in %s", sub, count1, str);
}

3 Comments

Don't use global variables for no reason. void main is wrong; should be int main. "%[^\n]s" doesn't do what you want; the s is not part of the % directive and requires a literal s to be entered. You didn't specify an upper bound for inputs; this is a potential buffer overflow. Always check the return value of scanf if you have to use it. Don't use scanf for user input. strlen returns size_t, not int. You have redundant parentheses in the while condition; while not a bug per se, this silences the warning gcc would give you if you typo'd == as =.
The while loop doesn't check for end-of-string and can run off the end of str and sub if all characters match. j and count are always set together; they're effectively the same variable. Your algorithm is completely broken: It doesn't find e.g. "ab" in "aab".
In general, avoid posting answers that consist of code only. A description of the algorithm or an explanation of how your answer is different from the others would help.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.