1

I have a big 2D array (576,000 X 4), and huge database (millions records and 10 columns, its size is in Gigabytes). The array, of course, is much smaller than the number of records in the database.

I need some effective way to compare the 2D array to the database, and delete the equal lines from the 2D array only.

Does anyone have an idea how could i apply it efficiently? The speed is very important to me.


I tried to apply it like that:

$query = mysqli_query($config, "SELECT * FROM sec ") or die(mysql_error());
while ($row = mysqli_fetch_array($query) ) {
    if ( isset($arr[$row['CHROM']][$row['POS']]) ) {
        // delete line from the 2D array
    }
}

But, i don't know how efficient it is, because i tried it just on small database, and it makes me load all the records of the database to the PHP page, and it creates a memory problem.

Another way that i check is this:

foreach ($arr as $chr=>$v) {
    foreach ($v as $pos=>$val) {
        $query = mysqli_query($config, "SELECT * FROM sec WHERE CHROM='$chr' && POS='$pos' ") or die(mysql_error());
        if (mysqli_num_rows($query) > 0) {
            // delete line from the 2D array
        }
    }
}

But, its not a good solution, because it took too much time.


edit:

my sec table looks like that:

enter image description here

the call to a item from the 2D array looks like that $arr[some_CHAROM][some_POS]

if the some_CHAROM equal to some CHAROM in the database AND some_POS equal to the POS in the same line, we have a match.


i build the 2D array from a file that the user upload to the website. and im not load it to the mySql.

12
  • 4
    I think T-SQL and MySQL are mutually exclusive. Commented Dec 26, 2014 at 13:33
  • You can't use tsql with a MySQL database, do you mean you have a MSSQL (eg SQL Server) database? Commented Dec 26, 2014 at 13:34
  • Just to clarify, you are trying to compare records in a large database with what is in your 2D array, right? Commented Dec 26, 2014 at 13:41
  • @GolezTrol good to know that. i try to get some idea's from google, and t-sql is one of the things that i see there, that i was thinking to mysely that could work. Commented Dec 26, 2014 at 13:53
  • 2
    Where does your large array come from? If that comes from the database as well, then it's probably easier to do your comparison directly there Commented Dec 26, 2014 at 13:56

3 Answers 3

3

The algorithm:

  1. convert the file uploaded by the user into a CSV file (if not already in this format); this is a simple task that can be done in several lines of PHP code; see function fputcsv();
  2. create a buffer table: tbl1;
  3. use LOAD DATA LOCAL INFILE to load the content of the (local) CSV file into the buffer table tbl1;
  4. use:

    DELETE tbl1
    FROM tbl1
        INNER JOIN tbl2 on tbl1.id = tbl2.id
    

    to delete from table tbl1 the rows that have matches in table tbl2. I assumed the match field is named id on both tables; change it to match your design;

  5. fetch the data from table tbl1, format it as you wish, send it to the browser;
  6. cleanup: DROP TABLE tbl1;

Because the script processes a file uploaded by an user, in order to avoid any concurrency issue you need to generate for the buffer table an unique name for each user. You can use a prefix and append the userId to it to avoid two users using the same table on the same time.

Sign up to request clarification or add additional context in comments.

2 Comments

It might be worthwile to build an index on tbl1.id (assuming that there's already one on tb2.id!) +1
An index on tbl2.id is required to help the query run as fast as it can. An index on tbl1.id is not really needed but it could help, indeed, if tbl1 has many columns. Because there is no WHERE clause, all the rows of the table on the left side of the JOIN are analyzed. MySQL's query planner will put on the table having less rows the left when analyze the query (tbl1 has less rows here). An index on tbl1.id helps indeed; it allows MySQL to read the values of tbl1.id from the index instead of reading the table data -> less bytes to load from the storage -> faster execution.
0

Try following code

$servername = "localhost";
$username = "root";
$password = "";
$dbname = "drupal7";


mysql_connect($servername, $username, $password );
mysql_select_db($dbname);




$sql = "SHOW TABLES FROM $dbname";
$result = mysql_query($sql);

if (!$result) {
    echo "DB Error, could not list tables\n";
    echo 'MySQL Error: ' . mysql_error();
    exit;
}
$database1=array();
while ($row = mysql_fetch_row($result)) {

    $result1 = mysql_query("SELECT * FROM ".$row[0]);
    if(mysql_num_rows($result1)){
        $num_rows = mysql_num_rows($result1);
    // echo "Table: {$row[0]} ==>".$num_rows."<br>";
        $database1[$row[0]]=$num_rows;
    }

    // }
}
echo '<pre>';
print_r($database1);

mysql_free_result($result);
// mysql_close();

$dbname='drupal71';
mysql_select_db($dbname);


$sql = "SHOW TABLES FROM $dbname";
$result = mysql_query($sql);

if (!$result) {
    echo "DB Error, could not list tables\n";
    echo 'MySQL Error: ' . mysql_error();
    exit;
}
$database2=array();
while ($row = mysql_fetch_row($result)) {

    $result1 = mysql_query("SELECT * FROM ".$row[0]);
    if(mysql_num_rows($result1)){
        $num_rows = mysql_num_rows($result1);
    // echo "Table: {$row[0]} ==>".$num_rows."<br>";
        $database2[$row[0]]=$num_rows;
    }

    // }
}


print_r($database2);


$test = array_diff($database1, $database2);

print_r($test);die;

Comments

0

From your code snippet

foreach ($arr as $chr=>$v) {
    foreach ($v as $pos=>$val) {
        $query = mysqli_query($config, "SELECT * FROM sec WHERE CHROM='$chr' && POS='$pos' ") or die(mysql_error());
        if (mysqli_num_rows($query) > 0) {
            // delete line from the 2D array
        }
    }
}

I assume, that you want to delete based on $chr and $pos.

So, you could do the following: Assemble a single query to rule them all* :)

$ors = array();
foreach ($arr as $chr=>$v) {
    foreach ($v as $pos=>$val) {
        $ors[] = "CHROM='$chr' AND POS='$pos'";
    }
}

$deleteConditions = "(" . implode(") OR (", $ors) . ")":
$query = mysqli_query($config, "DELETE FROM sec WHERE " . $deleteConditions);

Untested, but this should give you a single query, like

DELETE FROM 
  sec 
WHERE 
  (CHROM='1' AND POS='2') OR 
  (CHROM='3' AND POS='4') OR 
  (CHROM='5' AND POS='6') OR 
  ...

depending on what $chr and $pos are.

*As Ollie Jones noted in the comments: Take care of the overall query length. If required, create a second, third, ... query until you processed all items in appropriate batches.

4 Comments

You're gonna get slammed by max_allowed_packet if you try this. Read this: stackoverflow.com/questions/16335011/…
@OllieJones ofc. this has to be considered. But to delete 10.000 rows, you better delete in batches, than firing single queries.
im not delete the lines from the mySql, i delete the lines JUST from the 2D array
@Idoroni, that changes a lot... (I leave the answer, so people can see this statement) For me, your question was "how to delete database rows that are IN the 2D array."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.