Execution time issue - Postgresql

Question

Below is the function which i am running on two different tables which contains same column names.

-- Function: test(character varying)
-- DROP FUNCTION test(character varying);
CREATE OR REPLACE FUNCTION test(table_name character varying)
  RETURNS SETOF void AS
$BODY$
DECLARE
  recordcount integer;
  j integer; 
  hstoredata hstore;
BEGIN
  recordcount:=getTableName(table_name);
  FOR j IN 1..recordcount LOOP
    RAISE NOTICE 'RECORD NUMBER IS: %',j;
    EXECUTE format('SELECT hstore(t) FROM datas.%I t WHERE id = $1', table_name) USING  j INTO   hstoredata;
    RAISE NOTICE 'hstoredata: %', hstoredata;
  END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;

When the above function is run on a table containing 1000 rows time taken is around 536 ms.

When the above function is run on a table containing 10000 rows time taken is around 27994 ms.

Logically time taken for 10000 rows should be around 5360 ms based on the calculation from 1000 rows, but the execution time is very high.

In order to reduce execution time, please suggest what changes to be made.

what does getTableName() do? Additionally: assigning the result of a function called getTableName() to a "recordcount" seems quite strange (why do you assign a "name" to a "count"). The loop looks strange as well. Maybe if you told us what your real problem is, we could improve the function. — user330315
– user330315, Commented Sep 19, 2013 at 11:37
my real problem is when i run the above function on 1000 rows it is taking around 536 ms and when it is run on 10000 rows it is taking around 27994 ms. I would like to know why there is a huge difference in time executing the function between 1000 rows and 10000 rows. — user2664380
– user2664380, Commented Sep 19, 2013 at 11:59
I am replacing the getTableName() with the this line "EXECUTE 'select count(*)from datas.'||table_name into recordcount;" basically it stores the record count of the table. — user2664380
– user2664380, Commented Sep 19, 2013 at 12:05
And why do you use the number of rows to limit the values of the id column? And apparently you are dealing with all rows anyway, so why the LOOP`? You could simply retrieve all rows together with the ID and get the same result. — user330315
– user330315, Commented Sep 19, 2013 at 12:08
A For loop will always preform exponentially worse when more rows are fed through it...SQL is best when it's run as 1 statement does all rows at once while your code here is forcing it into 1 statement per row. Why the raise notice?..I've never used that outside of troubleshooting. Echo a_horse_with_no_name, you shouldn't be asking why this doesn't perform well, you should be asking yourself why you are getting a database to do something like this. — Twelfth
– Twelfth, Commented Sep 19, 2013 at 18:31

Daniel Vérité · Accepted Answer · 2013-09-19 21:01:51Z

1

Logically time taken for 10000 rows should be around 5360 ms based on the calculation from 1000 rows, but the execution time is very high.

It assumes that reading any particular row takes the same time as reading any other row, but this is not true. For instance, if there's a text column in the table and it sometimes contains large contents, it's going to be fetched from TOAST storage (out of page) and dynamically uncompressed.

In order to reduce execution time, please suggest what changes to be made.

To read all the table rows while not necessary fetching all in memory at once, you may use a cursor. That would avoid a new query at every loop iteration. Cursors accept dynamic queries through EXECUTE.

See Cursors in plpgsql documentation.

answered Sep 19, 2013 at 21:01

Daniel Vérité

62.3k16 gold badges134 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user330315 · Accepted Answer · 2013-09-19 21:18:16Z

As far as I can tell you are over complicating things. As the "recordcount" is used to increment the ID values, I think you can do everything with a single statement instead of querying for each and every ID separately.

CREATE OR REPLACE FUNCTION test(table_name varchar)
  RETURNS void AS
$BODY$
DECLARE
   rec record;
begin
  for rec in execute format ('select id, hstore(t) as hs from datas.%I', table_name) loop
    RAISE NOTICE 'RECORD NUMBER IS: %',rec.id;
    RAISE NOTICE 'hstoredata: %', rec.hs;
  end loop;
end;
$BODY$
language plpgsql;

The only thing where this would be different than your solution is, that if an ID smaller than the count of rows in the table does not exist, you won't see a RECORD NUMBER message for that. But you would see ids that are bigger than the row count of the table.

Any time you execute the same statement again and again in a loop very, very loud alarm bells should ring in your head. SQL is optimized to deal with sets of data, not to do row-by-row processing (which is what your loop is doing).

You didn't tell us what the real problem is you are trying to solve (and I fear that you have over-simplified your example) but given the code from the question, the above should be a much better solution (definitely much faster).

Collectives™ on Stack Overflow

Execution time issue - Postgresql

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related