3

I need help to create a script for finding keywords in a string, and inserting them into a database for use in a tag cloud.

  • The script would need to obviously dismiss characters, and common words like 'I', 'at', 'and', etc.
  • Get a value for the frequency of each keyword it finds and then insert it into the database if it's new, or update the existing row with the addition of the strings keyword count.
  • The string is unformatted text from a database row.

I'm not new to PHP, but I haven't attempted anything like this before, so any help is appreciated.

Thanks, Lea

2
  • What do you call a keyword? Every [long enough] word in a string? Commented Apr 3, 2011 at 13:37
  • I guess I could create a static array for it to compare to? Commented Apr 3, 2011 at 13:38

5 Answers 5

2

Google + php keywords from text = http://www.hashbangcode.com/blog/extract-keywords-text-string-php-412.html

Sign up to request clarification or add additional context in comments.

Comments

1

Well, the answer is already there, I still post my code for the little work that has gone into it.

I think that a mysql db is not ideal for storing this kind of data. I would suggest something like memcachedb, so you can easily access a keyword by using it as an index to fetch the count from the db. Persisting those keywords in a high load environment may cause problems with a mysql db.

$keyWords = extractKeyWords($text);

saveWords($keyWords);

function extractKeyWords($text) {
    $result = array();

    if(preg_match_all('#([\w]+)\b#i', $text, $matches)) {
        foreach($matches[1] as $key => $match) {

            // encode found word to safely use as key in array
            $encodedKey = base64_encode(strtolower($match));

            if(wordIsValid($match)) {
                if(array_key_exists($encodedKey, $result)) {
                    $result[$encodedKey] = ++$result[$encodedKey];  
                } else {
                    $result[$encodedKey] = 1;
                }
            }
        }
    }

    return $result;
}

function wordIsValid($word) {
    $wordsToIgnore = array("to", "and", "if", "or", "by", "me", "you", "it", "as", "be", "the", "in");
    // don't use words with a single character
    if(strlen($word) > 1) {
        if(in_array(strtolower($word), $wordsToIgnore)) {
            return false;
        } else {
            return true;    
        }
    } else {
        return false;       
    }
}

// not implemented yet ;)
function saveWords($arrayOfWords) {
    foreach($arrayOfWords as $word => $count) {
        echo base64_decode($word).":".$count."\n";
    }
}

Comments

0

You could approach this with a dictionary of keywords or a dictionary of words to ignore. If you make a dictionary of key words then count each time one is used and then update a database table with the keywords. If you make a dictionary of words to ignore then strip those words from posts and insert or update a count for all the remaining words into the keyword table.

1 Comment

Ok I see. In theory yes, but I don't know how to approach it practically. I can gather that I should create two arrays and use them as the "dictionaries".. but how do I do the counting, and ignoring? I'm new to using arrays so a practical example would help.
0

The way does it is by storing every word entered in every post in a table. When people search the forum, the result is the post IDs from which the words came.I suggest something like this.

Compare a user submission with your array of blacklisted (obvious) words which would come from a database table. THe words that survive are your keywords. Enter those keywords into your database table. Then use a SELECT * statement from your table to return a result set. Use the array_count function as demonstrated to get your count.

Perhaps a better way is to do what most sites do and force the user to enter their keywords (Stackoverflow, delicious, etc.) That way you can skip all the parsing up front.

1 Comment

Yes, I will make the user put their own keyword in, but I am working with existing data, and "upgrading" i guess, to add this functionality because tags/keywords will be implemented in the "new" system.
-1

If the string is not too long and you won't have memory issues with storing the string in arrays, how about this?

# string to parse, comes from the database as you suggested
$string = 'I at and Cubs PHP Cubs';

# string is now an array
$stringArray = explode(" ", $string);

# list of "obvious" words to exclude, this would probably come from a database table
$wordsToExclude = array('I', 'at', 'and');

# array that contains your "keywords"
# Array('Cubs', 'PHP', 'Cubs')
$keywordArray = array_diff($stringArray, $wordsToExclude);

# array with the keyword as the key and the count as the value
# Array('Cubs' => 2, 'PHP' => 1)
$countedValues = array_count_values($keywordArray);

Now you need to "search" the database for the keys in the $countedValues array. What does your table look like?

Or of course you could avoid reinventing the wheel and Google "php tag cloud"...

Reference: PHP array functions

11 Comments

this is unusable in the real life example.
Google "php tag cloud" will give you either HTML formatting or selecting from DB. In fact, the question has nothing to do with cloud and, possibly - with tags.
I am ignoring him, he is being unconstructive, I suggest you do the same. I appreciate ANY help, as I did say in the OP.
@Lea in fact I am being constructive. Sometimes we need some criticism too, not only code to copy/paste.
Of course. But that isn't what the question is about. Maybe you should go and ask your own question about that topic. And you could call it "Should this website be about giving out copy & paste examples, or help a poster learn through theory"... no doubt it would be a great question. But it's off-topic here. None of your input has helped aside from being critical, which again, isnt the point of the post, or this forum. People don't ask questions for you to criticize them. And people don't give answers for you to prod them. You haven't helped a bit, therefore you haven't been constructive.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.