1

I am trying to stop spam and a lot of it is recurring words, is there any way to check the string to see if there is a recurring phrase inside the string?

$string = 'Hello ! Hello ! Hello ! Hello !";

Thanks,

1
  • 2
    What about using akismet instead of rolling your own anti-spam solution? I've written a spam-filter for generic texts once, and it's hard to filter without having false positives. Akismet is ready to be used. Commented Apr 23, 2012 at 13:20

3 Answers 3

3

substr_count is fine when you know what you're looking for. If you don't know what is the spam word, you can use str_word_count and array_count_values:

$string = 'Hello! Hello! Hello! Hello! Lorem Ipsum';
$words = str_word_count($string, 1);
$count = array_count_values($words);
print_r($count);

This will give you this:

Array
(
    [Hello] => 4
    [Lorem] => 1
    [Ipsum] => 1
)

You can sort() this array and get a ranking of the most used words on the string. You should also check for stopwords (like "and", "or", "me" and such).

Sign up to request clarification or add additional context in comments.

2 Comments

If there any way to perform a function if one of those words is used, over say, 10 times, excluding common these words: like a, the, my, he, she, that
You'll need a list of stopwords in an array, then loop through the $count list. I believe there is no PHP function that solves the stopword problem by its own. Perhaps array_filter can help you.
1

A fast Google Search gave me that:

http://php.net/manual/en/function.substr-count.php

However there are better anti-spam ideas. Like Captchas. Human spammers are pretty difficult to catch. They'll find a way around your word counter. Maybe you should think about an approach of user-based spam report like Youtube does.

Comments

0

You could try substr_count(): http://php.net/manual/en/function.substr-count.php

$string = 'hello ! hello ! hello ! hello !';
echo substr_count($string, 'hello');

Of course if you want to check whether ANY of the words in your string occur multiple times... then this becomes a lot less efficient. You'd probably have to keep track of a list of 'checked words' and, for each not-yet-checked word in your string check whether it occurs multiple times.

LIke Binarious mentioned, a captcha would be a nicer way to stop spam ;-)

1 Comment

The thing is, i don't know what the repeated string is. I thought there was a boolean function for it...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.