Postgres Index Function Appears to Create Different Values Depending On When It's Called

Question

I am observing behavior I cannot explain in Postgres v9.6.

A table exists, let's call it 'orders' with a schema of

create table orders(
  id uuid primary key,
  created_at timestamp with time zone not null,
  modified_at timestamp with time zone not null,
  entity jsonb not null);

I need to create an index based on a function that uses the id and the modified_at values to create the index value. Therefore, I have

-- We put the 'A' on the front because if the beginning of the string looks like a date, it triggers
-- a postgres bug that causes sorting to skip some rows. The 'A' makes the value not look like a date.
create function sortable_timestamp_and_uuid(timestamp with time zone, uuid) returns text as
$$ select 'A' || to_char($1, 'YYYY-MM-DD HH24:MI:SS.US') || '/' || $2::text; $$
language sql immutable;

create function paging_func(timestamp with time zone, uuid) returns text as
$$ select sortable_timestamp_and_uuid($1, $2); $$
language sql immutable;

create index my_paging_idx on orders( paging_func(modified_at, id) );

This works as expected. I can create an index on the orders table and when I run a query with a WHERE clause of paging_func(modified_at, id) < pagine_func(some_specfic_timestamp, some_specific_uuid) it returns the results that I would expect.

HOWEVER this only works for data that already existed in the table when I created the index. If I insert data into the table, INSERT id, created_at, modified_at, entity VALUES(?,now(),now(),?), my previous where clause does not work for the newly inserted data. The data appears at the top (least in value) of the index.

For example, if I have two rows with values:

id                                    | modified_at
--------------------------------------------------------------------
199967e2-0987-2987-11c7-bbca1348467e  | 2020-01-14 20:14:25.799287
298bc22a-6eaa-5ec3-d962-ad2d206a4dca  | 2020-01-14 20:14:25.799287

If I create the index with the rows already existing the database and query with

WHERE paging_func(modified_at, id) < paging_func(to_timestamp('2020-01-14 20:14:25.799287',
          'YYYY/MM/DD HH24:MI:SS.US'),
          uuid_in('298bc22a-6eaa-5ec3-d962-ad2d206a4dca'))

The result set will only contain the first row. If however, only the first row exists when the index is created and I insert the second row into the table, and run the same exact query both rows are returned.

If I delete the index and re-create the index, the index behaves as expected for the data in the table, but all new values inserted into the table are not indexed correctly. I would appreciate any help explaining what I am doing incorrectly and why I am observing this behavior.

Well, your function isn't really immutable, so it's no surprise that you get strange results from an index that relies on that wrong information. Why don't you simply create an index on (id, the modified_at)? — user330315
– user330315, Commented Jan 14, 2020 at 20:45
@a_horse_with_no_name, How is it not immutable? This is from the postgres docs IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value. — Needs Help
– Needs Help, Commented Jan 14, 2020 at 20:50
See Laurenz' answer for why your function isn't really immutable. — user330315
– user330315, Commented Jan 14, 2020 at 21:09

Laurenz Albe · Accepted Answer · 2020-01-14 20:56:39Z

The reason is that you were lying when you said that the function was immutable:

SET timezone = 'UTC';

SELECT sortable_timestamp_and_uuid('2020-01-01 00:00:00+00',
                                   '9a1b6ef4-370f-11ea-9c8d-d0c637b5521b');

                   sortable_timestamp_and_uuid                    
------------------------------------------------------------------
 A2020-01-01 00:00:00.000000/9a1b6ef4-370f-11ea-9c8d-d0c637b5521b
(1 row)

SET timezone = 'Europe/Vienna';

SELECT sortable_timestamp_and_uuid('2020-01-01 00:00:00+00',
                                   '9a1b6ef4-370f-11ea-9c8d-d0c637b5521b');

                   sortable_timestamp_and_uuid                    
------------------------------------------------------------------
 A2020-01-01 01:00:00.000000/9a1b6ef4-370f-11ea-9c8d-d0c637b5521b
(1 row)

So when timezone is different when you write the row and when you are trying to SELECT it, the query might not find the row. In short, data corruption.

Use a function like this:

CREATE FUNCTION sortable_timestamp_and_uuid(timestamp with time zone, uuid) RETURNS text AS
$$SELECT 'A' || ($1 AT TIME ZONE 'UTC')::text || '/' || $2::text$$
LANGUAGE sql IMMUTABLE;

user330315 · Accepted Answer · 2020-01-14 21:09:04Z

0

The whole function is unnecessary.

Create a multi-column index:

create index my_paging_idx on orders(modified_at, id);

Then change your WHERE clause to:

WHERE (modified_at, id) < (timestamp '2020-01-14 20:14:25.799287', '298bc22a-6eaa-5ec3-d962-ad2d206a4dca'::uuid)

Should be just as efficient, and the index will be smaller as well as it contains the binary representation of both values rather than the strings.

answered Jan 14, 2020 at 21:09

user330315

Collectives™ on Stack Overflow

Postgres Index Function Appears to Create Different Values Depending On When It's Called

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related