1

I've written a SQL query that I want to wrap in a function. It collects data between two timestamps and returns it to a user. Using the provided timestamps slows down the function incredibly while using the same timestamps, hardcoded into the query, acts fine.

Any ideas what could possibly be happening?

First function:

CREATE OR REPLACE FUNCTION my_variable_func(
   id text,
   start_time timestamp with time zone DEFAULT '2022-05-24 07:10:00',
   end_time timestamp with time zone DEFAULT '2022-05-24 07:20:00')
RETURNS TABLE(... some stuff...) 
LANGUAGE 'sql'
COST 100
VOLATILE PARALLEL SAFE 
ROWS 1000

AS $BODY$
SELECT 
  ... some complex CTE + window query + aggregate ...
WHERE event_time > $2::timestamp with time zone
AND event_time < $3::timestamp with time zone

Fast hardcoded query:

CREATE OR REPLACE FUNCTION my_hardcoded_func(
   id text,
   start_time timestamp with time zone DEFAULT '2022-05-24 07:10:00',
   end_time timestamp with time zone DEFAULT '2022-05-24 07:20:00')
RETURNS TABLE(... some stuff...) 
LANGUAGE 'sql'
COST 100
VOLATILE PARALLEL SAFE 
ROWS 1000

AS $BODY$
SELECT 
  ... some complex CTE + window query + aggregate ...
WHERE event_time > '2022-05-24 07:10:00'::timestamp with time zone
AND event_time < '2022-05-24 07:20:00'::timestamp with time zone

Running

SELECT * FROM my_variable_func(123, '2022-05-24 07:10:00','2022-05-24 07:20:00')

completes in 30 sec while

SELECT * FROM my_hardcoded_func(123, '2022-05-24 07:10:00','2022-05-24 07:20:00')

takes under 2 secs. Any ideas? I'm stumped...

EDIT

After some more work: Taking a look at the locks on the database; it looks like the "hardcoded" function fetches data from the correct partitions and the "variable" function searches through all partitions to get the data that it requires, which is massively slow. I'm not sure what has changed since it's taking MUCH longer than 30 seconds to run the variable function. I've tried to force index use with "set enable_seqscan = off;" without success.

2
  • No, the "my_variable_func" is always around 20-30 sec, caching doesn't seem to happen. The "my_hardcoded_func" might take a little longer on the first run (but not near 20 sec) and after that settles to around 1.5-2 sec. The only difference between the queries is whether they use the "start_time"/"end_time" variables or the exact same hardcoded version of them... Commented May 24, 2022 at 9:34
  • Try to move or duplicate the date range conditions inside the some complex CTE Commented May 24, 2022 at 11:09

1 Answer 1

1

That is very likely the effect of plan caching. PostgreSQL can cache execution plans for queries in PL/pgSQL statements. Usually, that is a win, because the query doesn't have to be planned every time it is executed. But sometimes things can go wrong.

The first five times you call my_variable_func in a database session, it will generate a custom plan that takes the actual parameter values into account. From the sixth execution on, it will check if a generic plan (one that does not take the actual parameter values into account) promises to be as efficient as the custom plans. If PostgreSQL deems that to be the case, it will use the generic plan from then on.

It seems that this heuristic goes wrong in your case, and the generic plan is actually less efficient. You can tell if the first five executions in a database session are fast, but the sixth execution is slow. In that case, try the following:

  1. try to provide better statistics:

    ALTER TABLE tab SET (autovacuum_analyze_scale_factor = 0.01);
    ANALYZE tab;
    

    Then start a new database session and try again. Perhaps collecting table statistics often enough already solves the problem.

  2. If that is not enough, disable generic plans for that function:

    ALTER FUNCTION my_variable_func SET plan_cache_mode = force_custom_plan;
    

    Then PostgreSQL will never again use generic plans for that function.

Sign up to request clarification or add additional context in comments.

5 Comments

The table I'm getting the bulk of the data from (and doing the time based filter) is around 500gb and getting 100 to 200 inserts per second... It's also partitioned into weekly chunks. Will altering the autovacuum settings be a good idea here?
I started a new session, set the plan_cache_mode to "force_custom_plan" and reran the Select from my_variable_func query. If anything the performance is worse now... The query has been running for around 10 minutes now... any ideas?
Ok, then it must be a different problem, and my guess was wrong. In that case you have to go to the effort to capture EXPLAIN (ANALYZE, BUFFERS) output from a slow execution (perhaps using auto_explain) and add it to the question.
I ran ANALYZE on the big table, it took a couple of hours and now my variable queries are as fast as the hardcoded ones. Thanks! I kind of wish I had more info on how to prevent this in the future but at least I know what went wrong (sort of)
Ah. Then the first part of my answer should fix the problem, which was obviously outdated statistics.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.