3

How can I efficiently determine if a given string contains two strings?

For example, let's say I'm given the string: abc-def-jk-l. This string either contains two strings divided by a -, or it's not a match. The matching possibilities are:

Possible Matches for "abc-def-jk-l" :
abc           def-jk-l
abc-def       jk-l
abc-def-jk    l

Now, here are my columns of strings to match:

Column I       Column II
-------        -------
1. abc-def     A. qwe-rt
2. ghijkl      B. yui-op
3. mn-op-qr    C. as-df-gh
4. stuvw       D. jk-l

How can I efficiently check to see if the given string matches two strings in the columns above? (The above is a match - matching abc-def and jk-l)

Here are some more examples:

abc-def-yui-op   [MATCH - Matches 1-B]
abc-def-zxc-v    [NO MATCH - Matches 1, but not any in column II.]
stuvw-jk-l       [MATCH - Matches 4-D]
mn-op-qr-jk-l    [Is this a match?]

Now, given a strings above, how can I efficiently determine matches? (Efficiency will be key, because columns i and ii will each have millions of rows on indexed columns in their respected tables!)

UPDATE: The order will always be column i, then column ii. (or "no match", which could mean it matches only one column or none)

Here's some php to help:

<?php

$arrStrings = array('abc-def-yui-op','abc-def-zxc-v','stuvw-jk-l','stuvw-jk-l');

foreach($arrStrings as $string) {
    print_r(stringMatchCheck($string));
}

function stringMatchCheck($string) {

   $arrI = array('abc-def','ghijkl','mn-op-qr','stuvw');
   $arrII = array('qwe-rt','yui-op','as-df-gh','jk-l');

   // magic stackoverflow help goes here!

    if ()
        return array($match[0],$match[1]);
    else
        return false;

}

?>
4
  • Is the string always going to begin with a match from column I and end with a match from column II if there is a match? Your examples seem to indicate that is the case, and would make a big difference if you wanted to be efficient. Commented Jun 27, 2012 at 16:09
  • Your first example match is 1-B not 1-A Commented Jun 27, 2012 at 17:08
  • @PaoloBergantino - yes, column i will always precede column ii, unless it's a no-match scenario (updated question above to clarify). Commented Jun 27, 2012 at 18:06
  • @SpacedMonkey - good eye. corrected. Thanks. Commented Jun 27, 2012 at 18:07

2 Answers 2

2

Just use PHP's strpos(). Loop until you find an entry from $arrI in $string using strpos(), and do the same for $arrII.

More info on strpos(): http://php.net/manual/en/function.strpos.php

EDIT:

To help you see what I'm talking about, here's your function:

function stringMatchCheck($string) {

    $arrI = array('abc-def','ghijkl','mn-op-qr','stuvw');
    $arrII = array('qwe-rt','yui-op','as-df-gh','jk-l');

    $match = array(NULL, NULL);

    // get match, if any, from first group    
    for ($i=0; $i<count($arrI) && !is_null($match[0]); $i++) {
        if (strpos($string,$arrI[$i]) !== false) {
            $match[0]=$arrI[$i];
        }
    }

    if (!is_null($match[0])) {
        // get match, if any, from second group group    
        for ($i=0; $i<count($arrII) && !is_null($match[1]); $i++) {
            if (strpos($string,$arrII[$i]) !== false) {
                $match[1]=$arrII[$i];
            }
        }
    }


    if (!is_null($match[0]) && !is_null($match[1])) {
        return $match;
    } else {
        return false;
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. But, unless I'm misunderstanding, your strategy could require looping through millions of records, at least twice. This would be very inefficient.
You're thinking about it wrong. You loop through your big list (1 million records) once. You take each entry in the big list, and try to find the small list items in it. Quite efficient because you can short-circuit the search for the small list items once you find one in each group.
Ah, I see where you're coming from. Thanks. I haven't experimented with it yet, but I could see that resulting in similar response times. I'll experiment later. Thanks again. Upvoted.
1

For efficiency sake, rather than loop through every entry in each column, split the string into as many different words as it takes and search for every word combination. Basically what you mention as possible matches.

$words = explode("-", $string);
$end = count($words) - 1;

for ( $i = 1; $i < $end; $i++ ) {
    $partOne = array_slice($words, 0, $i);
    $parttwo = array_slice($words, $i);
    $wordOne = implode("-" , $partOne);
    $wordTwo = implode("-" , $partTwo);

    /* SQL to select $wordOne and $wordTwo from the tables */
}

3 Comments

Interesting approach. Trying it out now. Thanks.
that's a great solution. The only change would be to add "or equal" to your for loop (i.e. $i = 1; $i <= $end; $i++) so you can get the last iteration. But for my scenario it works perfectly since column ii always includes two parts. This solution limits the lookups to just a couple and can be handled very efficiently. Thanks!
@Ryan, I'm unclear about the efficiency here. It looks as if you're going to be querying the record set of a million records multiple times, for each possible combination of words. Is that right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.