0

EDIT: Thanks to nice_dev for providing an excellent solution below!

I had to rewrite part of his query for correct results:

From:

$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray, &$matched) {
              $matched[ $val ] = $matched [$val] ?? 0;
              $matched[ $val ]++;
              return $matched[ $val ] <= $tmparray[ $val ];
            }
        );   

To:

$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
 function ($val) use (&$tmparray) {
  return empty($tmparray[$val]) || !$tmparray[$val]++;
 }
);

Original:

I'm attempting to compare arrays to get a new array of matched and unmatched items. I have a loop that goes through roughly 55,000 items. The processing of this script can take upwards of 20+ minutes to attempt to complete and I've narrowed it down to both usage of array_intersect and array_filter within the foreach. Ideally, I need it to complete much faster. If I limit the foreach to 1000 items it still takes upwards of ~3 minutes to complete which is slow for the client-side experience.

If I remove them, the script completes almost immediately. As such, I will include only these relevant pieces in the code below.

I'm using a custom array_intersect_fixed function as regular array_intersect returned wrong results with duplicate values as per here.

Explanations:

totalidsarray = An array of numbers such as ['11233', '35353, '3432433', '123323']. Could contain thousands of items.

$deck['main_deck'] = An array of numbers to compare against $totalidsarray. Similar structure. Max length is 60 items.

foreach($dbdeckarray as $deck){

         $tmparray = $totalidsarray;

        //Get an array of unmatched cards between the deck and the collection
        //Collection = $tmparray
        $tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray) {
                $key = array_search($val, $tmparray);
                if ( $key === false ) return true;
                unset($tmparray[$key]);
                return false;
            }
        );   

        //Get an array of matched cards between the deck and the collection
        $tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $totalidsarray);

        //Push results to matcharray
        $matcharray[] = $tmpmatch;

}

//Built in array_intersect function returns wrong result when input arrays have duplicate values.
function array_intersect_fixed($array1, $array2) {
    $result = array();
    foreach ($array1 as $val) {
        if (($key = array_search($val, $array2, FALSE))!==false) {
            $result[] = $val;
            unset($array2[$key]);
        }
    }
    return $result;
}

To make matters worse, I have to do 2 further matched/unmatched checks within that same foreach loop against another array extra_deck, further increasing processing time.

Is there a more optimized approach I can take to this?

EDIT: Explanation of what the code needs to achieve.

The script will retrieve the user's card collection of cards that they own from a card game. This is assigned into totalidsarray. It will then query every deck in the database (~55,000) and compare the collection you own against the built deck of cards (main_deck). It then attempted to extract all owned cards (matched) and all un-owned cards (unmatched) into two arrays. Once the full foreach loop is done, the client-side returns a list of each deck alongside the matched cards/unmatched cards of each (with a % match for each).

8
  • The problem is the length of time it takes to complete. I'd like to lower it. Commented Oct 11, 2022 at 11:18
  • 55k isn't a lot, so it sounds like there's probably a lot of wasted effort (too much branching or checking something repeatedly), and/or process quagmires. The latter I've had with in_array(), which can introduce performance issues at really low numbers (hundreds). So I see array_search() in both functions and wonder if there's just too much effort, leaning on this function, and not enough setup (like flipping values as keys, which resolves in_array() performance issues by having a reference lookup, which is fast). json_decode() is also a code smell here, as others point out. Commented Oct 11, 2022 at 11:19
  • I Think one problem is in json_decode(), try with json_decode($deck['main_deck']), true), the default is to decode into an object. So in your code it get converted to an object, then it get casted to an array. You do so twice, just do only once an assign to a var and use it. Commented Oct 11, 2022 at 11:20
  • Why is json_decode($deck['main_deck'] inside your loop (and not even just once, but in two places), why aren't you doing this once, before? Commented Oct 11, 2022 at 11:20
  • 1
    The foreach loop is ran once. The database query pushes all the decks into $dbdeckarray. Closes the database connection, then moves onto the foreach. Commented Oct 11, 2022 at 11:35

2 Answers 2

2

A couple of optimizations I can suggest:

  • The array_intersect_fixed routine you have is quadratic in nature in terms of getting the result, because it is 2 nested loops under the hood. We can use array_count_values to optimize it to work in linear time(which uses a map).

  • json_decode() doesn't need to be done twice for every deck. If you do it once and use it wherever needed, it should work just fine(unless you make any edits in place which I don't find right now) . It also needs to be decoded to an array and not to an object using the true flag.

  • For your array_filter, the comparison is also quadratic in nature. We will use array_count_values again to optimize it and use a $matched array. We keep counting the frequency of elements and if any of them surpasses count in $tmparray, we return false, else, we return true.

Snippet:

<?php 

$tmparray = array_count_values($totalidsarray);

foreach($dbdeckarray as $deck){
        $matched = [];        
        $deck['main_deck'] = json_decode($deck['main_deck'], true);
        $tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray, &$matched) {
              $matched[ $val ] = $matched [$val] ?? 0;
              $matched[ $val ]++;
              return $matched[ $val ] <= $tmparray[ $val ];
            }
        );   

        $tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $tmparray);
        $matcharray[] = $tmpmatch;
}

function array_intersect_fixed($array1, $array2) {
    $result = array();
    $matched = [];
    foreach ($array1 as $val) {
        $matched[ $val ] = $matched[ $val ] ?? 0;
        $matched[ $val ]++;
        if (isset($array2[ $val ]) && $matched[ $val ] <= $array2[ $val ]) {
            $result[] = $val;
        }
    }
    return $result;
}

Note: array_intersect_fixed expects $array2 to be in the Hashmap way by default. If you wish to use it elsewhere, make sure to pass array_count_values of the array as 2nd parameter or use a third parameter to indicate a flag check otherwise.

Sign up to request clarification or add additional context in comments.

6 Comments

Wow, amazing! Thank you for explaining this. Performance has improved massively and it's loading exceptionally faster now (<1 min as opposed to 20-30+ mins). However, the unmatched results keep returning everything as being unmatched (so a fully array) even when matched will say otherwise. Changing it over to array_diff($deck['main_deck'], $totalidsarray); fixes this but as the cost of some of the performance.
@GenesisBits That's good to hear. I have made updates to my answer. Can you copy as is and check again? It should work for unmatched.as well now
seems to be the same result for unmatched unfortunately where it fills the array with everything (so they are all marked as unmatched).
After some further testing, it seems like unmatched is returning the exact same result set as matched given the info above
I got it working! Based on the code you provided I rewrote it. I will add it to the OP!
|
1

Beside @nice_dev suggestion your code can be simplified.

The unmatched part is an array diff

array_diff($deck['main_deck'], $tmparray);

The array_intersect_fixed(), if the problem are duplicated value in array, can be avoided by running array_unique() on the array (I guess is $deck['main_deck']) before calling array_intersect()

This will also speed up array_diff() as it will have less array element to compare.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.