2

I have a text file with 850000 individual update statements, updating a table containing 12 million records. An example of such a query is:

update bag.pand
set mutatiedatum = to_date('04-03-2013 10:03:48','DD-MM-YYYY HH:MI:SS')
where key='0321100000015282_2013022600000000_N_0'
 ;

Where 'key' has a unique index on it.

I execute the file using psql.exe -q -1 -f '/path/to/sql/file' on a 9.1 postgres/postgis database.

This is executing extremely slow: some 14-20 statements per second.

When I explain analyze a separate statement it is fast enough:

Update on pand  (cost=0.00..6.12 rows=1 width=814) (actual time=0.101..0.101 rows=0 loops=1)
  ->  Index Scan using idx_pand_key on pand  (cost=0.00..6.12 rows=1 width=814) (actual time=0.093..0.093 rows=0 loops=1)
        Index Cond: ((key)::text = '0321100000015282_2013022600000000_N_0'::text)
Total runtime: 0.237 ms

The log file contains no warnings or errors.

I understand that by using -f all 850000 statements are executed in one transaction. Could this be the reason for the slowness? Is there another solution or hint to speed things up?

12
  • What is the fraction of records you expect to touch with this update? (the explained plan touches 0 records) Commented May 11, 2013 at 11:56
  • 90-95% (on our production environment it should be 100%, but I am testing on my notebook, where the data is slightly older) Commented May 11, 2013 at 11:59
  • Is there an index on the mutatiedatum column? That would explain it. Commented May 11, 2013 at 12:01
  • BTW: ` 14-20 statements per second.` : you are executing this statement repeatedly (but with different values for the key and the timestamp - constants) ? Commented May 11, 2013 at 12:01
  • The 850k update statements will update all 12 million rows? Commented May 11, 2013 at 12:22

1 Answer 1

1

If all the updates have the same shape, you could load them into a temp-table and use that to update the bag.pand table in one batch , like this:

CREATE TABLE bag.mutaties
        ( zkey varchar NOT NULL PRIMARY KEY
        , zdate timestamp NOT NULL
        );
COPY bag.mutaties(zkey,zdate) FROM 'the_big_file';

update bag.pand dst
FROM bag.mutaties src
set mutatiedatum = src.zdate
WHERE dst.key = src.zkey
        ;

The above assumes the dates are in ISO 'yyyy-mm-dd hh:mm:ss' format.

If you are unable to create a data file with this date format, you could read the existing 'dd-mm-yyyy' dates into a string and convert them in the update statement (similar to your row-at-a-time updates:

CREATE TABLE bag.mutaties
        ( zkey varchar NOT NULL PRIMARY KEY
        , dutchdate varchar NOT NULL
        );

COPY bag.mutaties(zkey,dutchdate) FROM 'the_big_file';

update bag.pand dst
FROM bag.mutaties src
set mutatiedatum = to_date(src.dutchdate, 'DD-MM-YYYY HH:MI:SS')
WHERE dst.key = src.key
        ;

EXTRA BONUS UPDATE:

        -- This will read in the existing SQL-SCRIPT (!!)
        -- and transform it into a table with {key,datetimestamp}
CREATE TABLE bag.tekstmutaties
        ( id SERIAL NOT NULL PRIMARY KEY
        , typ INTEGER NOT NULL DEFAULT 0
        , num INTEGER NOT NULL DEFAULT 0
        , tekst varchar
        );

        -- Read in the existing script file
        -- (this needs to be in /tmp/ to avoid permission problems)
COPY bag.tekstmutaties(tekst) FROM '/tmp/bagmut.txt';

        -- Remove bagger
delete from bag.tekstmutaties
where LEFT(tekst,1) NOT IN ( 's' , 'w')
        ;

        -- Extract the timestamp
UPDATE bag.tekstmutaties
SET typ = 1
        , tekst = regexp_replace( tekst, E' *set.*to_date..', '', 'ig')
WHERE LEFT(tekst,1) IN ( 's')
        ;
UPDATE bag.tekstmutaties
SET tekst = LEFT( tekst , 19)
WHERE typ = 1
        ;

        -- Extract the key
UPDATE bag.tekstmutaties
SET typ = 2
        , tekst = regexp_replace( tekst, E' *where key..', '', 'ig')
WHERE LEFT(tekst,1) IN ( 'w')
        ;
UPDATE bag.tekstmutaties
SET tekst = regexp_replace( tekst , '[^0-9A-Z_]' , '' , 'g' )
WHERE typ = 2
        ;

        -- number the records
UPDATE bag.tekstmutaties
SET num = id
WHERE TYP=1
        ;
        -- number the records
UPDATE bag.tekstmutaties uu
SET num = src.val
FROM (
        SELECT id, lag(id) OVER (ORDER BY id) AS val
        FROM bag.tekstmutaties
        ) src
WHERE uu.TYP=2
AND src.id = uu.id
        ;

SELECT * FROM bag.tekstmutaties ORDER BY id;

        -- The final table with the {key,timestamp} pairs
CREATE TABLE bag.mutaties
        ( zkey varchar NOT NULL PRIMARY KEY
        , zdate timestamp NOT NULL
        );

        -- Fill it with self-join of teksttable
INSERT INTO bag.mutaties (zkey, zdate)
SELECT k.tekst AS zkey
        , to_date(d.tekst, E'DD-MM-YYYY HH:MI:SS' ) AS zdate
FROM bag.tekstmutaties k
JOIN bag.tekstmutaties d ON k.num = d.num
WHERE k.typ=2
AND d.typ = 1
        ;
SELECT * FROM bag.mutaties;

-- now **after verification** you can use the mutaties-table
-- to batch-update the bag.pand table
EXPLAIN ANALYZE
update bag.pand dst
FROM mutaties src
set mutatiedatum = src.zdate
WHERE dst.key = src.key
        ;
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks, I will try this as a last resort. Because this would mean a considerable change to the current proces which creates the sqlfile. So for now I still want to try the update statements.
The script is generated by another process? If you have enough skills with the editor, it would take <15 minutes to transform the script into a datafile. Otherwise create a small program to do the conversion. (theoretically, you could even do it in SQL, but that would be a major feat...)
true, but this is prob not a onetime operation.... I will let run the script until tomorrow if it hasn't finished by then I will try your solution. thanks.
I already created a script that eats up your .sql script and transforms it into data. BRB
I also tried your bonus update script (thanks!), but that didn't work: As far as I understand your script, you assume statements starting with "select ...", whereas my statements start with 'update', so I will have to adjust that and it's consequences in e.g. SET tekst = LEFT( tekst , 19). Next time!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.