0

I need to harness similar_text() for an array of values that look something like this:

$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];

What I'm trying to do is find the words what are practically the same, i.e. lawyer and lawyers in the above array, and add the counts for them together in a new array.

So lawyer would be 4 as lawyers would be associated to the original string of lawyer.

Keep in mind, this array will only ever be singular words and the length is unspecified, it could range from 1 to >99.

I had no idea where to start with this, so I gave it a crack with a foreach loop as you'll see below, but the intended output isn't as expected.

foreach ( $strings as $key_one => $count_one ) {
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            if(!isset($counts[$key_one])) {
                $counts[$key_one] = $count_one;
            } else {
                $counts[$key_one] += $count_two;
            }
        }
    }
}

Note: The percent match is at 80 for this example (as the match for lawyer & lawyers is ~92%)

Which ends up giving me something similar to the following:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
    [lawyers] => 2
)

Where I require it to be:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
)

Notice how i require it to practically remove lawyers and add the count to lawyer.

4
  • 1
    Why don't you transform it to oop? Then you could just register and deregister objects. Like all words are initially registered to object master, if a match is found the match is removed from master and regsitered to the match. That way you preserve the values and you can dynamically insert words if needed. Commented Feb 25, 2015 at 7:34
  • 1
    maybe you want to take a look at the Levenshtein distance algorithm small description on wikipedia Commented Feb 25, 2015 at 7:35
  • @MichaelDibbets Thanks for that comment! I've figured out how to get it now, I just have to remove the item from the array so that it doesn't get set in the newly created array. If you want to post that as an answer I'll accept it. Commented Feb 25, 2015 at 7:42
  • @RaphaelMüller Thank you for that comment as-well, I had a read of the wiki and it is a rather interesting algorithm. Commented Feb 25, 2015 at 7:42

2 Answers 2

2

Your difficulty is that just as lawyer is similar to lawyers, lawyers is also similar to lawyer. So they both get their count bumped up by the other.

Try this:

foreach ( $strings as $key_one => &$count_one ) {
    if ($count_one == 0) continue; // skip it if we've already processed it
    if (!isset($counts[$key_one]) {
        $counts[$key_one] = $count_one;
        $count_one = 0;
    }
    foreach ( $strings as $key_two => &$count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $count_two = 0;
        }
    }
}

The disadvantage of that is that you change your original $strings array which may not be ideal. Here's another approach, keeping track of already-processed strings in another hash:

$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
    if (isset($already[$key_one])) continue; // skip if already processed
    $counts[$key_one] = $count_one; // by definition this should be new
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $already[$key_two] = true;
        }
    }
}

I would recommend the 2nd solution.

Sign up to request clarification or add additional context in comments.

Comments

1

You can always use

unset( $counts[$key_two] ) ;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.