0

I need to check some input string against a huge (and growing) list of strings coming from a CSV file (1000000+). I currently load every string in an array and check against it via in_array(). Code looks like this:

$filter = array();
$filter = ReadFromCSV();

$input = array("foo","bar" /* more elements... */);
foreach($input as $i){
  if(in_array($i,$filter)){
    // do stuff
  }
}

It already takes some time and I was wondering is there is a faster way to do this?

1
  • Maybe you should think about using a DB instead of reading the values from CSV into memory each time. Commented Aug 31, 2014 at 11:57

2 Answers 2

3

in_array() checks every element in the array until it finds a match. The average complexity is O(n).

Since you are comparing strings, you might store your input as array keys instead of values and look them up via array_key_exists(); which requires a constant time O(1).

Some code:

$filter = array();
$filter = ReadFromCSV();
$filter = array_flip($filter); // switch key <=> value

$input = array("foo","bar" /* more elements... */);
foreach($input as $i){
  if(array_key_exists($i,$filter)){ // array_key_exists();
    // do stuff
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Wow.. that really makes a difference. Why can't in_array be that fast?
It only works that easy because you are comparing strings. Array key types are restricted to integer and string whereas array values can be anything.
0

That's what indexes were invented for.

It's not a matter of in_array() speed, as the data grows, you should probably consider using indexes by loading data into a real DBMS.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.