4

I have a large set of data stored in a multi-dimensional array. An example structure is as below:

Array
(
    [1] => Array
        (
            [0] => motomummy.com
            [1] => 1921
            [2] => 473
        )
    [4] => Array
        (
            [0] => kneedraggers.com
            [1] => 3051
            [2] => 5067
        )
)

I also have a table in a mysql database that currently contains ~80K domain names. This list will grow monthly by possibly ~10K+ domain names. The goal is to compare Array[][0] (the domain name) against the mysql database and return an array with preserved values (but key preservation is not important) that only contains unique values.

Please note, that I only want to compare the first index alone, NOT the entire array.

The initial multi-dimensional array is assumed to be enormous in size (more than likely anywhere from 100k to 10 million results). What is the best way to get data back that is not contained in the database?

What I am doing now is simply storing to an array, the complete list of domains from the database, then using the following function, comparing each value in the initial array against the database array. This is horribly slow and inefficient obviously.

// get result of custom comparison function
$clean = array_filter($INITIAL_LIST, function($elem) {
$wordOkay = true;

// check every word in "filter from database" list, store it only if not in list           
    foreach ($this->domains as $domain) {
        if (stripos($elem[0], $domain) !== false) {
            $wordOkay = false;
            break;
        }
    }

    return $wordOkay;
});

Some pseudo code or even actual code would be very helpful at this point.

4
  • So you want the entries of "array" that are not present in the database? Commented Dec 21, 2012 at 16:44
  • Precisely, maybe I'll reword my question a bit to clarify. Commented Dec 21, 2012 at 16:45
  • Why not just use a SELECT statement Commented Dec 21, 2012 at 16:52
  • That was my next thought, but for performance sake, is that most efficient and fastest? Running a SELECT for 80,000 results doesn't seem that efficient, but I really don't know. Commented Dec 21, 2012 at 16:58

1 Answer 1

2

Use the DBMS! It was made for stuff like that.

  • Create a temporary table temp { id (fill with array index); url (filled with url)}

  • Fill it with your array's data

  • Ideally create an index on temp.url

  • Query the database:

    SELECT * FROM `temp` LEFT JOIN `urls`
    WHERE urls.url = temp.url AND urls.url IS NULL;
    

    (the table urls is your existing data)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.