1

I am trying to update mysql database using perl. I tried different approaches, and all of them are super slow. It takes 10-15 mins to update 14k records. I thought the best way would be creating stored procedure and call it from perl file, but still same response time.

elsif($update eq "incrementalLoad") {
    $filename = param("customPricing");
    $handle = upload("customPricing");
    while (<$handle>)  {                
        $currentRecord++;
        @currentLine = split( /,/, $_ );
        $i = 0;
        foreach $l(@currentLine){
            $currentLine[$i] =~ s/\\r//g; 
            $i++;           
        }       
        $query = "CALL upsertIncremental('$currentLine[0]', '$currentLine[1]', '$currentLine[2]', '$currentLine[3]', '$currentLine[4]', '$currentLine[5]', '$currentLine[6]', '$currentLine[7]', '$currentLine[8]', '$currentLine[9]', '$currentLine[10]', '$currentLine[11]', '$currentLine[12]', '$currentLine[13]', '$currentLine[14]', '$currentLine[15]', '$currentLine[16]', '$currentLine[17]', '$currentLine[18]', '$currentLine[19]', '$currentLine[20]', '$currentLine[21]', '$currentLine[22]', '$currentLine[23]', '$currentLine[24]', '$currentLine[25]')";
        $sth = $dbh->do($query) or die "Afasdf";
    }   
    print $currentRecord . " Record(s) uploaded.<br/>";
    $dbh->disconnect;           
}

what can I do to improve performance ?

Is this better ?

elsif($update eq "incrementalLoad") {
    $filename = param("customPricing");
    $handle = upload("customPricing");

    my $update_handle = $dbh->prepare_cached("UPDATE custompricingtest SET partNumberSKU= ?, customerClass= ?, customerName= ?, customerId= ?, customerNumber= ?, custPartNumber=?, svcType= ?, sppl= ? , svcDuration= ?, durationPeriod= ?, priceMSRP= ?, partnerPriceDistiDvarOEM= ?, msrpSvcPrice=?, partnerSvcPrice=?, msrpBundlePrice=?, partnerBundlePrice=?, startDate=?, endDate=?, currency=?, countryCode=?, inventoryItemId=?, flexField1=?, flexField2=?, flexField3=?, flexField4=?, flexField5=? WHERE partNumberSKU=? and ifnull(customerClass,0)=ifnull(?,0) and ifnull(customerName,0)=ifnull(?,0) and ifnull(svcType,0)=ifnull(?,0) and ifnull(svcDuration,0)=ifnull(?,0) and ifnull(durationPeriod,0)=ifnull(?,0)") or $error = 1;

    while (<$handle>)  {                
        $currentRecord++;
        @currentLine = split( /,/, $_ );
        $i = 0;
        foreach $l(@currentLine){
            $currentLine[$i] =~ s/\\r//g; 
            $i++;           
        }       
        $update_handle->execute($currentLine[0],$currentLine[1],$currentLine[2],$currentLine[3],$currentLine[4],$currentLine[5],$currentLine[6],$currentLine[7],$currentLine[8],$currentLine[9],$currentLine[10],$currentLine[11],$currentLine[12],$currentLine[13],$currentLine[14],$currentLine[15],$currentLine[16],$currentLine[17],$currentLine[18],$currentLine[19],$currentLine[20],$currentLine[21],$currentLine[22],$currentLine[23],$currentLine[24],$currentLine[25],$currentLine[0],$currentLine[1],$currentLine[2],$currentLine[6],$currentLine[8],$currentLine[9]) or die "can't execute UPDATE  query. \n";
        print $currentRecord . "<br/>";
    }   
    print $currentRecord . " Record(s) uploaded.<br/>";
    $dbh->disconnect;           
}

Table format

  CREATE TABLE `custompricingtest` (
  `partNumberSKU` varchar(255) DEFAULT NULL,
  `customerClass` varchar(255) DEFAULT NULL,
  `customerName` varchar(255) DEFAULT NULL,
  `customerId` varchar(255) DEFAULT NULL,
  `customerNumber` varchar(255) DEFAULT NULL,
  `custPartNumber` varchar(255) DEFAULT NULL,
  `svcType` varchar(255) DEFAULT NULL,
  `sppl` varchar(255) DEFAULT NULL,
  `svcDuration` varchar(255) DEFAULT NULL,
  `durationPeriod` varchar(255) DEFAULT NULL,
  `priceMSRP` varchar(255) DEFAULT NULL,
  `partnerPriceDistiDvarOEM` varchar(255) DEFAULT NULL,
  `msrpSvcPrice` varchar(255) DEFAULT NULL,
  `partnerSvcPrice` varchar(255) DEFAULT NULL,
  `msrpBundlePrice` varchar(255) DEFAULT NULL,
  `partnerBundlePrice` varchar(255) DEFAULT NULL,
  `startDate` varchar(255) DEFAULT NULL,
  `endDate` varchar(255) DEFAULT NULL,
  `currency` varchar(255) DEFAULT NULL,
  `countryCode` varchar(255) DEFAULT NULL,
  `inventoryItemId` varchar(255) DEFAULT NULL,
  `flexField1` varchar(255) DEFAULT NULL,
  `flexField2` varchar(255) DEFAULT NULL,
  `flexField3` varchar(255) DEFAULT NULL,
  `flexField4` varchar(255) DEFAULT NULL,
  `flexField5` varchar(255) DEFAULT NULL,
  KEY `part_num_sku` (`partNumberSKU`),
  KEY `svcType` (`svcType`),
  KEY `svcDuration` (`svcDuration`),
  KEY `durationPeriod` (`durationPeriod`),
  KEY `customerClass` (`customerClass`)
) 

I end up saving the file in a temp table (takes couple of seconds) and then doing an updating join of the tables but still its really slow. Below is the update query

UPDATE custompricingtest t1, custompricingtesttemp t2 
SET t1.customerId         = t2.customerId, 
    t1.customerNumber     = t2.customerNumber, 
    t1.custPartNumber     = t2.custPartNumber, 
    t1.sppl               = t2.sppl , 
    t1.priceMSRP          = t2.priceMSRP, 
    t1.partnerPriceDistiDvarOEM = t2.partnerPriceDistiDvarOEM,
    t1.msrpSvcPrice       = t2.msrpSvcPrice, 
    t1.partnerSvcPrice    = t2.partnerSvcPrice, 
    t1.msrpBundlePrice    = t2.msrpBundlePrice,
    t1.partnerBundlePrice = t2.partnerBundlePrice, 
    t1.startDate          = t2.startDate, 
    t1.endDate            = t2.endDate, 
    t1.currency           = t2.currency, 
    t1.countryCode        = t2.countryCode, 
    t1.inventoryItemId    = t2.inventoryItemId, 
    t1.flexField1         = t2.flexField1, 
    t1.flexField2         = t2.flexField2, 
    t1.flexField3         = t2.flexField3, 
    t1.flexField4         = t2.flexField4, 
    t1.flexField5         = t2.flexField5
WHERE t1.partNumberSKU  = t2.partNumberSKU 
  and t1.customerClass  = t2.customerClass 
  and t1.customerName   = t2.customerName 
  and t1.svcType        = t2.svcType 
  and t1.svcDuration    = t2.svcDuration 
  and t1.durationPeriod = t2.durationPeriod

Explain extended resullt

id  select_type     table   type    possible_keys                                                               key               key_len   ref                            rows     Extra
1   SIMPLE           t2     ALL     part_num_sku,customerName,customerClass,durationPeriod,svcDuration,svcType  NULL                NULL    NULL                          28758      
1   SIMPLE           t1     ref     part_num_sku,svcDuration,customerName,customerClass,durationPeriod,svcType  part_num_sku         13     testbrocade.t2.partNumberSKU    394     Using where

Nitesh

9
  • 1
    Disable indexes for the duration of the import then re-enable/update indexes afterwards. Commented Aug 4, 2011 at 16:27
  • I tried that..still no use. Is the code I posted in the end better then others ? Commented Aug 4, 2011 at 17:00
  • the prepared statement will cut down on sql parseing/compiling overhead, but most likely the bottleneck is somewhere else. Commented Aug 4, 2011 at 17:09
  • thats what I am trying to figure out from 3 days...Is it because of my complicated update statement ? I am setting around 25 columns and checking against 5 columns with if(null) conditions Commented Aug 4, 2011 at 17:13
  • 1
    possibly - since you're comparing derived values in the where clause, use of indexes could be precluded. try running one of those queries by hand in the mysql monitor and "explain" it, to see which indexes are being used. if indexes on those ifnull()'d fields can't be used, then you'd be doing a bunch of whole table scans for each row. Commented Aug 4, 2011 at 17:15

5 Answers 5

1

Don't use a stored procedure; do multiple rows in the same statement. Depending on your max_allowed_packet, you should be able to do all your rows in one or a just a few insert statements.

Oops, sorry, somehow I misread update as insert. Updating is harder; I'll try to show an example if you will show your create table.

Sign up to request clarification or add additional context in comments.

10 Comments

thanks for the reply...can you give me bit more details on how to do multiple rows in the same statement ?
How about using LOAD DATA INFILE WITH REPLACE function ? Can't I achieve same functionality with super speed ?
I added improved code in the end..is that what you meant ? or there is room for further optimization ?
for 14k rows, I don't know that I'd bother with LOAD DATA INFILE. can you add output of show create table customerpricingtest to your question?
I just did...later I realized that I can't use LOAD DATA INFILE because condition for update is based on non-unique columns. 14k is taking for ever to get updated, may be because of my complicated update query
|
1

start a transaction, do all the inserts and at last commit. This frees mySql to manage a single transaction for each insert.

1 Comment

did you try this approach? even in the bounty anwser, I do recommend to do all inserts to the temp table in a single transaction which boosts preformance
0
+50

I've done a script like this before. You could do it with bookends.

  1. The initialize step is to create a temp table--guaranteed to be out of contention, and also no indexes are needed.
  2. The per record action is either batch inserts or a proc,
  3. and then the final step/proc is to commit the changes from the temp table to the main table in a transaction (and dispose of the temp table).

For this last step, you might even need to drop and create indexes to speed that up.

3 Comments

thanks for the reply...you mind sharing your script ? I am kinda newbie in this ..
I added update joining of two tables like you said..its still taking forever..I added the code above in description area...am I doing anything wrong?
You might want to "chunk" it by setting rowcount. Update and delete, update and delete. There was once or twice on a big system with a lot of data, we couldn't run an update script except on Saturday or something. Sometimes the runtime is just the runtime.
0

I don't know what your data looks like, but if you have an indexed column that is purely numeric (partNumberSKU?) you might consider changing the column type to int. That should make key comparisons faster. If you're using your temp table approach, you'll want to make the same change in each table (and make sure that both are indexed).

If you're playing around with which columns to index, aim for those that have a large number of unique values.

1 Comment

If you want to try a more dramatic solution, you could convert all your varchar(255) columns to char(255) (or whatever size is actually necessary). You'd use much more storage space, and might lose some performance due to the increased size of the database, but my understanding is that having fixed-length rows could speed updates significantly. This probably isn't an ideal solution, and it would only help if you're using MyISAM tables. See this post for more details.
0

The main problem I think is that MySQL can use only 1 index per table, per query. You have separately indexed your columns:

KEY `part_num_sku` (`partNumberSKU`),
KEY `svcType` (`svcType`),
KEY `svcDuration` (`svcDuration`),
KEY `durationPeriod` (`durationPeriod`),
KEY `customerClass` (`customerClass`)

But it will only be able to use one of these keys (the optimiser tries to work out which one). Maybe you need a composite key, comprising of the columns that you need? The index will be large, but it may help. Or it may be so large it does not.

Remember: in a MySQL composite index, the columns in the index have to be used in the query in left to right order or it will not be used. This may help: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.