5

Requirement:

We have two similar tables in two servers. First table in server having unique key columns A,B,C and we are inserting Table1 rows into Table2 which having unique key columns B,C,D.

Table1 is having approx 5 millions rows and Table2 will insert approx .3 millions rows due to different unique key columns constraints.

Here requirement is to fetch all the rows from Table1 and insert into Table2 if no same record is existing in Table2 and in case of record match, increase the count and update the 'cron_modified_date' column in Table2.

PHP version is 5.5 and MySQL version is 5.7 for this setup and DB server is having 6 GB RAM.

While executing below script, processing speed is getting very slow after processing 2 millions records and RAM is not freeing up and after sometime all RAM is consumed by script and after that script is not processing at all.

As you can see, I am resetting the variables and closing the DB connection as well but its not freeing up the DB server RAM. After some reading, i came to know, may be PHP garbage collection needs to call manually to free up the resources but its also not freeing up the RAM.

What I am doing wrong here and how to process millions records using the PHP, MYSQL?

Any other way to free up the RAM while executing the script and so that script should compete the execution?

/* Fetch records count for batch insert*/

$queryCount = "SELECT count(*) as totalRecords FROM TABLE1 where created_date > = '2018-02-10'";
$rowsCount = $GLOBALS['db']->execRaw( $queryCount)->fetchAll();

$recordsPerIteration = 50000 ;
$totalCount = $rowsCount[0]['totalRecords']; 
$start = 0;

gc_disable() ;
if ( $totalCount > 0 ) {
    while ( $totalCount > 0 ) {
    $query = "SELECT *  FROM TABLE1
                WHERE where created_date > = '2018-02-10'
                ORDER BY suggestion_id DESC 
                LIMIT ".$start.",".$recordsPerIteration;

    print "sql is $query" ;

    $getAllRows = $GLOBALS['db']->execRaw( $query )->fetchAll();
    $GLOBALS['db']->queryString = null;
    $GLOBALS['db']->close() ;

    foreach ($getAllRows as  $getRow) {

        $insertRow  = " INSERT INTO TABLE2 (
                            Name,
                            Company,
                            ProductName,
                            Status,
                            cron_modified_date)
                VALUE (   
                            ".$GLOBALS['db_ab']->quote($getRow['Name']).", 
                            ".$GLOBALS['db_ab']->quote($getRow['Company']).", 
                            ".$GLOBALS['db_ab']->quote($getRow['ProductName']).",
                            ".$getRow['Status'].",
                            ".$GLOBALS['db_ab']->quote($getRow['created_date'])."
                        )
                    ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date =  '".$getRow['created_date']."'" ;

                $GLOBALS['db_ab']->execRaw( $insertRow ) ;
                $GLOBALS['db_ab']->queryString = null;
                $getRow = null;
                $insertRow = null;
                $GLOBALS['db_ab']->close() ;
           }
          gc_enable() ;
          $totalCount   = $totalCount- $recordsPerIteration;
          $start        += $recordsPerIteration ;
          $getAllRows = null;
          gc_collect_cycles() ;
    }

}

Solution


After suggestions provided by @ABelikov and few hit & trail methods... Finally below code is working perfectly fine and freeing up the RAM after every 50K records inserts.

Below are major findings

  • Free up the DB connections variables after every major operations which involves large data operations and reconnect the DB so that DB buffer flushed out.
  • Club the insert statements and execute the inserts in one go. Don't execute single record insertion in loop.

    Thanks guys for valuable suggestions and helping out.

    /* Fetch records count for batch insert*/
    
    
    $queryCount = "SELECT count(*) as totalRecords FROM TABLE1 where created_date > = '2018-02-10'";
    $rowsCount = $GLOBALS['db']->execRaw( $queryCount)->fetchAll();
    
    $recordsPerIteration = 50000 ;
    $totalCount = $rowsCount[0]['totalRecords']; 
    $start = 0;
    
    if ( $totalCount > 0 ) {
       while ( $totalCount > 0 ) {
           $query = "SELECT *  FROM TABLE1
                WHERE where created_date > = '2018-02-10'
                ORDER BY suggestion_id DESC 
                LIMIT ".$start.",".$recordsPerIteration;
    
    print "sql is $query" ;
    
    $getAllRows = $GLOBALS['db']->execRaw( $query )->fetchAll();
    $GLOBALS['db']->queryString = null;
    $GLOBALS['db']->close() ;
    
    $insertRow  = " INSERT INTO TABLE2 (
                            Name,
                            Company,
                            ProductName,
                            Status,
                            cron_modified_date)
                VALUE (  " ;
    
    
    foreach ($getAllRows as  $getRow) {
    
    
            $insertRow  .= (".$GLOBALS['db_ab']->quote($getRow['Name']).", 
                            ".$GLOBALS['db_ab']->quote($getRow['Company']).", 
                            ".$GLOBALS['db_ab']->quote($getRow['ProductName']).",
                            ".$getRow['Status'].",
                            ".$GLOBALS['db_ab']->quote($getRow['created_date'])."),";
            }
    
    $insertRow=rtrim($insertRow,','); // Remove last ','
    $insertRow.= " ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date =  '".$getRow['created_date']."'" ;
    
    $GLOBALS['db_ab']->execRaw( $insertRow ) ;              
    //Flushing all data to freeup RAM
    $GLOBALS['db_ab'] = null ;
    $GLOBALS['db'] = null ;
    $insertRow = null;
    
    $totalCount = $totalCount- $recordsPerIteration;
    $start      += $recordsPerIteration ;
    $getAllRows = array();
    $getAllRows = null;
    print " \n Records needs to process ".$totalCount."\n";
    
    }
    
    }
    
5
  • @AtaurRahman: Yes you are right, but he has ON DUPLICATE KEY UPDATE... statement. Commented Feb 15, 2018 at 5:24
  • @SalimIbrogimov , Why would that matter? As I know, ON DUPLICATE KEY UPDATE would work just fine. Commented Feb 15, 2018 at 5:56
  • @AtaurRahman: ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date = '".$getRow['created_date']."'" ; Commented Feb 15, 2018 at 6:32
  • Why do you use gc_disable(); ? Are you using PDO or MySQLi as database layer? Commented Feb 15, 2018 at 8:09
  • Yes, I am using PDO and before calling the gc_collect_cycles() ;, i am disabling GC. Commented Feb 15, 2018 at 8:16

2 Answers 2

1

1. Insert multiple rows solution

You can speed up you script by using "insert multiple rows" see here https://dev.mysql.com/doc/refman/5.5/en/insert.html

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

You need keep only VALUES part in your foreach and move out all other

 $insertRow  = " INSERT INTO TABLE2 (
                             Name,
                             Company,
                             ProductName,
                             Status,
                             cron_modified_date) VALUES ";
 foreach ($getAllRows as  $getRow) {
     $insertRow.="(".$GLOBALS['db_ab']->quote($getRow['Name']).",
                   ".$GLOBALS['db_ab']->quote($getRow['Company']).", 
                   ".$GLOBALS['db_ab']->quote($getRow['ProductName']).",
                   ".$getRow['Status'].",
                   ".$GLOBALS['db_ab']->quote($getRow['created_date'])."),";

 }
 $insertRow=rtrim($insertRow,','); // Remove last ','
 $insertRow .= " ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date =  '".$getRow['created_date']."'" ;
 $GLOBALS['db_ab']->execRaw( $insertRow ) ;
 $GLOBALS['db_ab']->queryString = null;
 $getRow = null;
 $insertRow = null;
 $GLOBALS['db_ab']->close() ;

That will help only if your foreach "body" usually runs more than one time

2. MySQL sever-side solution

Try to use TRANSACTION https://dev.mysql.com/doc/refman/5.7/en/commit.html http://php.net/manual/en/pdo.begintransaction.php

Just begin one at start of script and commit at end. Depend on you server it can realy help. But be careful! It depend on your MySQL server config sets. Need testing.

Sign up to request clarification or add additional context in comments.

3 Comments

INSERT... SELECT is not the answer is this case. "We have two similar tables in two servers"
@RaymondNijland: Yes, agree. Don't know how I lose it. What should I do? Remove answer?
Edit the question you have 3 options in you answer?
0
  • PHP has no practical cap on memory; MySQL does.
  • Most of the time taken is in transferring data between MySQL and PHP.

Unless I misunderstood the task, you are working much too hard on it. Simply turn it over to MySQL to do all the work. No array, no chunking, just a single SQL:

INSERT INTO table2
          (Name, Company, ProductName, Status, cron_modified_date, count)
    SELECT Name, Company, ProductName, Status, created_date, 1
        FROM table1
    ON DUPLICATE KEY UPDATE
        count = count + 1
        cron_modified_date = created_date;

Note that you may need to use the pseudo function VALUES() in some of the updates.

This avoids fetching all (or even 5000) rows into PHP, which is probably where the memory problem is coming. A simple variable in PHP takes something like 40 bytes. MySQL is designed to work with any number of rows without blowing out RAM.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.