Postgres SQL UDF Optimization (Not Inlining?)

Question

I have a problem where I need to round timestamps representing end dates to the end of the current month or end of the prior month depending on where those timestamps fall relative to NOW() assuming the timestamp is in the same month as NOW(). I'm basically just rounding timestamps to the end of a particular month with some logic about which month to pick. There are two helper functions I'm using to make my date math a bit easier, one which converts dates to the start of the month and one which converts dates to the last day of the month:

CREATE FUNCTION first_of_month(val date) RETURNS date
    IMMUTABLE
    STRICT
    PARALLEL SAFE
    LANGUAGE sql
RETURN (DATE_TRUNC('MONTH'::text, (val)::timestamp WITH TIME ZONE))::date;

CREATE FUNCTION last_of_month(val date) RETURNS date
    IMMUTABLE
    STRICT
    PARALLEL SAFE
    LANGUAGE sql
RETURN ((DATE_TRUNC('MONTH'::text, (val)::timestamp WITH TIME ZONE) + '1 mon -1 days'::interval))::date;

The main function I'm calling in my operation is as follows:

CREATE FUNCTION rounded_end(end_ts timestamp WITH TIME ZONE, now_ts timestamp WITH TIME ZONE) RETURNS date
    IMMUTABLE
    STRICT
    PARALLEL SAFE
    LANGUAGE sql
RETURN CASE
           WHEN ((first_of_month((now_ts)::date) = first_of_month((end_ts)::date)) AND (end_ts >= now_ts))
               THEN last_of_month((end_ts)::date)
           ELSE (first_of_month((end_ts)::date) - 1) END;

All of these functions are written in SQL and marked IMMUTABLE with the goal of allowing them to be inlined, per information in this wiki.

However, I'm seeing wildly different performance results depending on whether I call rounded_end or inline it manually.

For example, this sample call which uses rounded_end takes 8-9s for me locally:

WITH timestamps AS (SELECT GENERATE_SERIES(timestamp '2014-01-10 20:00:00' +
                                           RANDOM() * (timestamp '2014-01-20 20:00:00' -
                                                       timestamp '2014-01-10 10:00:00'),
                                           timestamp '2025-01-10 20:00:00' +
                                           RANDOM() * (timestamp '2025-01-20 20:00:00' -
                                                       timestamp '2025-01-10 10:00:00'),
                                           '10 minutes') AS ts)
SELECT rounded_end(ts, NOW())
FROM timestamps;

While this sample call which manually inlines the body of rounded_end runs in under 2s:

WITH timestamps AS (SELECT GENERATE_SERIES(timestamp '2014-01-10 20:00:00' +
                                           RANDOM() * (timestamp '2014-01-20 20:00:00' -
                                                       timestamp '2014-01-10 10:00:00'),
                                           timestamp '2025-01-10 20:00:00' +
                                           RANDOM() * (timestamp '2025-01-20 20:00:00' -
                                                       timestamp '2025-01-10 10:00:00'),
                                           '10 minutes') AS ts)
SELECT CASE
           WHEN ((first_of_month((NOW())::date) = first_of_month((ts)::date)) AND (ts >= NOW()))
               THEN last_of_month((ts)::date)
           ELSE (first_of_month((ts)::date) - 1) END
FROM timestamps;

A DB Fiddle with a repro and ANALYZE timings is here: https://dbfiddle.uk/CtQxpa3S. I'm running on Postgres 15.

What gives? I can see in the results of ANALYZE that my function isn't being inlined, although all dependencies in the callstack should be IMMUTABLE, unless I'm missing something. What's preventing inlining?
For first_of_month and last_of_month, I would very happily precompute a mapping of all possible dates in the time range my data covers to what those results look like if it would be a performance optimization. Is there a strategy in Postgres for doing this, basically building a hashmap of all dates between 2014 and 2025 and what their respective first and last of months would be? I know I could write a new table and do a JOIN, but I'm wondering if there's a cheaper solution than a JOIN available out there.

Thanks much for the help!

I think that the issue might be as deep as last_of_month doing interval math, which this answer suggests is not immutable. Trying to see if there's a workaround, and still curious about strategies for precomputing first and last of month dates. — bhub
– bhub, Commented Apr 2, 2024 at 18:14

Zegarek · Accepted Answer · 2024-04-03 18:49:47Z

1

That's because there are a few variants of date_trunc() and only 1 of the 3 you're interested in, is immutable:

\dfS+ date_trunc

Name	Result data type	Argument data types	Volatility	Parallel	Description
date_trunc	interval	text, interval	immutable	safe	truncate interval to specified units
date_trunc	timestamp with time zone	text, timestamp with time zone	stable	safe	truncate timestamp with time zone to specified units
date_trunc	timestamp with time zone	text, timestamp with time zone, text	stable	safe	truncate timestamp with time zone to specified units in specified time zone
date_trunc	timestamp without time zone	text, timestamp without time zone	immutable	safe	truncate timestamp to specified units

In your helper functions, by casting to ::timestamp with timezone you're explicitly calling the 2-argument stable one, which breaks this rule in the list on the wiki you referenced:

if the function is declared IMMUTABLE, then the expression must not invoke any non-immutable function or operator

My guess is when you "inline manually" it's possible for the planner/optimizer to see you're passing a date to be cast to timestamp with timezone inside the date_trunc() call, and since ::date ignores timezones, it can ignore that and use the much more preferable immutable variant taking timestamp withOUT timezone.

Since you're casting to timezone-insensitive ::date before, along the way and after, you can cast to timestamp withOUT timezone in the helper functions and keep the immutability obvious to the planner.

As you established, that combined with the removal of strict enables all 3 functions to be inlined:

planner must be able to prove that the body expression necessarily returns NULL if any parameter is null.

Your case statement definitely isn't strict according to that: you can pass a null value in now_ts and it'll still return a non-null result going down the else branch and using end_ts. Even if you add a null-check, I think the whole case construct might qualify as non-strict regardless:

At present, this condition is only satisfied if: every parameter is referenced at least once, and all functions, operators and other constructs used in the body are themselves STRICT.

Demo at db<>fiddle

--switching from non-immutable timestamptz to timestamp
DROP FUNCTION IF EXISTS first_of_month(date) CASCADE;
CREATE FUNCTION first_of_month(val date) RETURNS date
    IMMUTABLE STRICT PARALLEL SAFE LANGUAGE sql
RETURN (DATE_TRUNC('MONTH'::text, (val)::timestamp))::date;

--why not let a helper function help a helper function
DROP FUNCTION IF EXISTS last_of_month(date) CASCADE;
CREATE FUNCTION last_of_month(val date) RETURNS date
    IMMUTABLE STRICT PARALLEL SAFE LANGUAGE sql
RETURN (first_of_month(val) + '1 mon -1 days'::interval)::date;

--removing the `strict`
DROP FUNCTION IF EXISTS rounded_end(  timestamp WITH TIME ZONE
                                    , timestamp WITH TIME ZONE) CASCADE;
CREATE FUNCTION rounded_end(end_ts timestamp, now_ts timestamp) RETURNS date
    IMMUTABLE /*STRICT*/ PARALLEL SAFE LANGUAGE sql
RETURN CASE WHEN first_of_month(now_ts::date) = first_of_month(end_ts::date)
             AND end_ts >= now_ts
            THEN last_of_month((end_ts)::date)
            ELSE (first_of_month((end_ts)::date) - 1) 
             END;

QUERY PLAN
CTE Scan on timestamps (cost=5.03..87.53 rows=1000 width=4) (actual time=0.016..10.193 rows=8665 loops=1)
Output: CASE WHEN (((date_trunc('MONTH'::text, (((now())::timestamp without time zone)::date)::timestamp without time zone))::date = (date_trunc('MONTH'::text, ((timestamps.ts)::date)::timestamp without time zone))::date) AND (timestamps.ts >= (now())::timestamp without time zone)) THEN (((date_trunc('MONTH'::text, ((timestamps.ts)::date)::timestamp without time zone))::date + '1 mon -1 days'::interval))::date ELSE ((date_trunc('MONTH'::text, ((timestamps.ts)::date)::timestamp without time zone))::date - 1) END
CTE timestamps
-> ProjectSet (cost=0.00..5.03 rows=1000 width=8) (actual time=0.007..0.936 rows=8665 loops=1)
Output: generate_series(('2014-01-10 20:00:00'::timestamp without time zone + (random() * '10 days 10:00:00'::interval)), ('2015-01-10 20:00:00'::timestamp without time zone + (random() * '10 days 10:00:00'::interval)), '01:00:00'::interval)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning:
Buffers: shared hit=6
Planning Time: 0.176 ms
Execution Time: 10.850 ms

edited Apr 3, 2024 at 18:49

answered Apr 2, 2024 at 18:36

Zegarek

29.8k5 gold badges27 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

bhub Over a year ago

I think that might be part of it! I could be wrong, but I don't think the function is being inlined even still. I just inlined all component calls completely, and it has by far the best performance of all of the options: in this fiddle here. See this answer on intervals not being immutable -- what do you think about that?

bhub Over a year ago

Ended up making progress here, the solution after tweaking DATE_TRUNC to use a TZ-unaware timestamp seems mostly related to getting rid of the STRICT qualifier

bhub Over a year ago

I ended up determining that the CASE statement is somehow not able to be inlined with STRICT on, so I relaxed that constraint on rounded_end and passed TZ-unaware timestamps the whole way through. Final fiddle is here: demo, which clearly shows the full function being inlined and is 100x faster than the approach I started with. I don't know why the CASE statement is a holdup; I suppose the planner isn't able to analyze it closely. Turns out it's extremely delicate to get this inlining logic working haha!

Zegarek Over a year ago

@BenH Good catch. The answer is on that wiki: planner must be able to prove that the body expression necessarily returns NULL if any parameter is null. - your case statement definitely isn't strict according to that: you can pass a null value in now_ts and it'll still return a non-null result going down the else branch and using end_ts. Even if you add a null-check, I think the whole case construct might qualify as non-strict regardless: this condition is only satisfied if: (...)all functions, operators and other constructs used in the body are themselves STRICT.

Collectives™ on Stack Overflow

Postgres SQL UDF Optimization (Not Inlining?)

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related