Postgres graph query gets slow because of a nested loop join

Question

Context

I am using postgres as a multi-tenant graph database.

All data is stored in a single table, and I rely on partial indexes to make queries efficient.

Here's how a minimal schema looks:

CREATE TABLE triples (
    app_id text NOT NULL,
    entity_id text NOT NULL,
    attr_id text NOT NULL,
    value jsonb NOT NULL,
    eav boolean NOT NULL DEFAULT false,
    ave boolean NOT NULL DEFAULT false,
    vae boolean NOT NULL DEFAULT false,
    created_at bigint NOT NULL DEFAULT 0,
    checked_data_type text
);

CREATE OR REPLACE FUNCTION triples_extract_number_value(value jsonb)
RETURNS double precision AS $$
BEGIN
  IF jsonb_typeof(value) = 'number' THEN
    RETURN value::double precision;
  ELSE
    RETURN NULL;
  END IF;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE INDEX vae_idx ON triples (app_id, value, attr_id, entity_id) WHERE vae;

CREATE INDEX ave_number_idx ON triples (app_id, attr_id, triples_extract_number_value(value), entity_id) WHERE ave AND checked_data_type = 'number';

In an app with conversations, groups, and messages, data could be stored as:

{app_id: 'app_1', entity_id: 'convo_1', attr_id: 'title': "My Conversation"} 
{app_id: 'app_1', entity_id: 'group_1', attr_id: 'title': "My Group"} 
{app_id: 'app_1', entity_id: 'convo_1', attr_id: 'group': "'group_1'", vae: true} // this 'links' convos to groups
{app_id: 'app_1', entity_id: 'msg_1', attr_id: 'time': "123", ave: true, checked_data_type: 'number'} // this indexes our 'time' field  
{app_id: 'app_1', entity_id: 'msg_1', attr_id: 'convo': "'convo_1'", vae: true} // this  links `msg_1` to `convo_1`

Goal

I want to satisfy the following query:

"Give me all convos, which belong to group 'group-1', and have a message who's time is greater than 5."

To do this, I wrote the following query, which does 3 self-joins:

SELECT 
    DISTINCT(match_0_0.entity_id)
FROM 
    triples AS match_0_0
JOIN 
    triples AS match_0_1
    ON match_0_1.app_id = match_0_0.app_id
    AND match_0_1.vae = true
    AND match_0_1.attr_id = 'convo'
    AND match_0_1.value = to_jsonb(match_0_0.entity_id)
JOIN 
    triples AS match_0_2
    ON match_0_2.app_id = match_0_1.app_id
    AND match_0_2.ave = true
    AND match_0_2.attr_id = 'time'
    AND triples_extract_number_value(match_0_2.value) >= 5
    AND match_0_2.checked_data_type = 'number'
    AND match_0_2.entity_id = match_0_1.entity_id
WHERE 
    match_0_0.app_id = 'chat_app'
    AND match_0_0.vae = true
    AND match_0_0.attr_id = 'groups'
    AND match_0_0.value = '"group_1"';

Problem

The problem is, this query takes about 14 seconds to satisfy. Running with EXPLAIN (ANALYZE, BUFFERS), shows:

QUERY PLAN
Unique (cost=1.10..51.92 rows=1 width=8) (actual time=0.588..14786.024 rows=150 loops=1)
Buffers: shared hit=523495
-> Nested Loop (cost=1.10..51.92 rows=1 width=8) (actual time=0.588..14784.589 rows=5996 loops=1)
Buffers: shared hit=523495
-> Nested Loop (cost=0.82..37.31 rows=4 width=25) (actual time=0.048..29.058 rows=12000 loops=1)
Buffers: shared hit=1499
-> Index Only Scan using vae_idx on triples match_0_0 (cost=0.41..11.98 rows=3 width=17) (actual time=0.029..0.533 rows=300 loops=1)
Index Cond: ((app_id = 'chat_app'::text) AND (value = '"group_1"'::jsonb) AND (attr_id = 'groups'::text))
Heap Fetches: 300
Buffers: shared hit=55
-> Index Only Scan using vae_idx on triples match_0_1 (cost=0.41..8.44 rows=1 width=34) (actual time=0.016..0.079 rows=40 loops=300)
Index Cond: ((app_id = 'chat_app'::text) AND (value = to_jsonb(match_0_0.entity_id)) AND (attr_id = 'convo'::text))
Heap Fetches: 12000
Buffers: shared hit=1444
-> Index Scan using ave_number_idx on triples match_0_2 (cost=0.28..3.64 rows=1 width=17) (actual time=0.929..1.229 rows=0 loops=12000)
Index Cond: ((app_id = 'chat_app'::text) AND (attr_id = 'time'::text) AND (triples_extract_number_value(value) >= '5'::double precision) AND (entity_id = match_0_1.entity_id))
Buffers: shared hit=521996
Planning:
Buffers: shared hit=109
Planning Time: 0.477 ms
Execution Time: 14786.197 ms

Nested Loop

Looking at EXPLAIN (ANALYZE, BUFFERS), I noticed that the nested_loop join has lots of buffer hits.

If I try disabling nested loop joins, the query resolves in 40 ms:

SET enable_nestloop TO off;

SELECT 
    DISTINCT(match_0_0.entity_id)
FROM 
    triples AS match_0_0
JOIN 
    triples AS match_0_1
    ON match_0_1.app_id = match_0_0.app_id
    AND match_0_1.vae = true
    AND match_0_1.attr_id = 'convo'
    AND match_0_1.value = to_jsonb(match_0_0.entity_id)
JOIN 
    triples AS match_0_2
    ON match_0_2.app_id = match_0_1.app_id
    AND match_0_2.ave = true
    AND match_0_2.attr_id = 'time'
    AND triples_extract_number_value(match_0_2.value) >= 5
    AND match_0_2.checked_data_type = 'number'
    AND match_0_2.entity_id = match_0_1.entity_id
WHERE 
    match_0_0.app_id = 'chat_app'
    AND match_0_0.vae = true
    AND match_0_0.attr_id = 'groups'
    AND match_0_0.value = '"group_1"';

Question

Is there a way I can hint to postgres, so it chooses a better strategy?

Repro

I set up a repro on DB Fiddle, which shows the slow query:

https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15620

Stepan Parunashvili · Accepted Answer · 2024-12-10 17:33:13Z

There were a few ways we discovered to hint to Postgres that a nested loop join was a bad idea here.

Option 1: Make nested_loop more expensive

If we adjusted some PG parameters:

SET random_page_cost = 2.0;
SET cpu_index_tuple_cost = 0.05;
SET cpu_operator_cost = 0.0001;

Then PG would use a hash join for this query.

We ran a backtest on other queries though, and this resulted in poorer performance overall.

Option 2: Add an index where `entity_id` comes before `value`

If we made an index where entity_id came before value, Postgres would switch to a hash join and use it:

CREATE INDEX aev_number_idx ON triples (app_id, attr_id, entity_id, triples_extract_number_value(value)) WHERE ave AND checked_data_type = 'number';

However, this didn't feel right to us.

We couldn't delete the ave_number_idx, as there are scenarios where we want to use value, to find entity_id.

This means we would have to dupe our data on this new index.

Option 3: Remove `entity_id` from `ave_number_idx`

The final option, was to remove entity_id in ave_number_idx.

CREATE INDEX ave_number_idx_no_e ON triples (app_id, attr_id, triples_extract_number_value(value)) WHERE ave AND checked_data_type = 'number';

This forced Postgres to do hash joins. Running a backtest, we didn't see any queries get slower.

We ended up choosing Option 3.

More alternatives

One method we tried was to re-write the query as materialized CTEs. However, this only sporradically worked. Some ways that we wrote the CTE would cause a hash join, and some ways would not.

We tried using extended statistics too, but were not able to create a statistic that told postgres this nested loop join was a bad idea.

Hi Stepan, you are allowed to tick your own answer after 24 hrs. — Rohit Gupta
– Rohit Gupta, Commented Dec 11, 2024 at 3:38

Stack Exchange Network

Postgres graph query gets slow because of a nested loop join

Context

Goal

Problem

Nested Loop

Question

Repro

1 Answer 1

Option 1: Make nested_loop more expensive

Option 2: Add an index where `entity_id` comes before `value`

Option 3: Remove `entity_id` from `ave_number_idx`

More alternatives

Your Answer

Hot Network Questions

Postgres graph query gets slow because of a nested loop join

Context

Goal

Problem

Nested Loop

Question

Repro

1 Answer 1

Option 1: Make nested_loop more expensive

Option 2: Add an index where entity_id comes before value

Option 3: Remove entity_id from ave_number_idx

More alternatives

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions

Option 2: Add an index where `entity_id` comes before `value`

Option 3: Remove `entity_id` from `ave_number_idx`