Postgres Recursive Query Behavior

Question

I have a table test with two columns, each referencing the other

ID1	ID2
2	1
3	2
4	3
5	4

Given an id, I need to traverse the chain to return all the rows traversing the chain. Below is the query I have written:

WITH RECURSIVE 
-- backward 
    backward_loop as (
    SELECT id1, id2 
    FROM test t
    WHERE test.id2 = '<id>'
    UNION
    SELECT id1, id2 
    FROM
        test t
        INNER JOIN backward_loop b ON (b.id1 = t.id2)
    ),
     -- forward 
    forward_loop as (
    SELECT id1, id2
    FROM test t
    WHERE test.id1 = '<id>'
    UNION
    SELECT id1, id2
    FROM
        test t
        INNER JOIN forward_loop f ON (f.id2 = t.id1)
)   
SELECT id1, id2
FROM backward_loop bl
UNION ALL
SELECT id1, id2
FROM forward_loop fl;

E.g. if given 2, I need to return all the rows (2,1), (3,2), (4,3) and (5,4). The query works fine and returns all the rows correctly.

However, when I have the below data, where one of the ids (50) has multiple mappings:

ID1	ID2
20	10
30	20
40	30
50	40
50	60

and when I run the query passing 20, I get only (20,10), (30,20), (40,30) and (50,40), but not the row (50,60). When I send in 50, I get back all the rows, but it's an issue when I input any other id.

Any idea how I can get the row (50,60) as well?

Please use the code formatting option for code, and table markdown for data. — Dale K
– Dale K, Commented 2 days ago
And I am interested to know why you haven't accepted an answer to an of your previously asked questions? That doesn't encourage people to assist you. — Dale K
– Dale K, Commented 2 days ago
And please stop signing your questions with your initials... that just noise to future readers — Dale K
– Dale K, Commented 2 days ago
And you only use the "database" tag when you haven't tagged a specific RDBMS (as per the tag guide to usage) — Dale K
– Dale K, Commented 2 days ago

Zegarek · Accepted Answer · 2025-11-18 15:33:21Z

You're looking for a minimum spanning tree. Aside from minor logical and syntactical problems in your current code, you made it so that it's able to go up from 50 to both 40 and 60, but not down from 20 into 50 and then back up to 60.

Instead of reinventing the wheel, you can simply use an implementation of Kruskal algorithm from highly optimised Boost C++ library, made available via pgrouting extension:
_{demo at db<>fiddle}

create extension pgrouting cascade;

prepare pgrouting_kruskal_dfs(int) as 
select pred as id1
     , node as id2 
from pgr_kruskalDFS(
  $x$ select row_number()over() as id
           , id1 as source
           , id2 as target
           , 1   as cost
      from test 
  $x$, $1)
where depth<>0;

execute pgrouting_kruskal_dfs(20);

id1	id2
20	10
20	30
30	40
40	50
50	60

To maintain the original layout of IDs, you need to join back to the table because to pgrouting those are just vertex identifiers in an undirected graph, making (50,60) and (60,50) equivalent:

prepare kruskal_original_id_layout(int)as
select t.id1,t.id2
from pgr_kruskalDFS(
  $x$ select row_number()over() as id
           , id1 as source
           , id2 as target
           , 1   as cost
      from test 
  $x$, $1)p(seq,depth,start_vid,id1,id2)
join test as t
on least(t.id1,t.id2)=least(p.id1,p.id2)
and greatest(t.id1,t.id2)=greatest(p.id1,p.id2)
and depth<>0;

execute kruskal_original_id_layout(20);

id1	id2
20	10
30	20
40	30
50	40
50	60

This problem is somewhat related to the one in your approach. If you update your hierarchy to use canonicalised format, e.g. make sure lower ID comes first, it'll work.

update test 
  set id1=id2
     ,id2=id1
  where id2>id1;

execute op_query(20);

id1	id2
30	20
40	30
50	40
60	50
20	10

If these relationships aren't commutative, you can't flip them but you're free to traverse in both directions, you can reduce your two CTEs to a single one running bidirectional search and keeping track of edges it already traversed. It's able to go up, or down, or both, at any time, so it's capable of diving from 20 to 50, then back up to 60:

PREPARE op_query2(int) AS
WITH RECURSIVE traversal AS (
    SELECT id1, id2, array[test] path
    FROM test
    WHERE id2 = $1
    UNION ALL
    SELECT t.id1, t.id2, b.path||t
    FROM test AS t
    JOIN traversal AS b 
      ON array[b.id1, b.id2] && array[t.id1, t.id2]
     AND NOT t=any(path)
)TABLE traversal;

execute op_query2(20);

id1	id2	path
30	20	{"(30,20)"}
20	10	{"(30,20)","(20,10)"}
40	30	{"(30,20)","(40,30)"}
50	40	{"(30,20)","(40,30)","(50,40)"}
50	60	{"(30,20)","(40,30)","(50,40)","(50,60)"}

SELECT id1, id2 FROM test t INNER JOIN backward_loop b is ambiguous because both t and b have those columns. You need to specify which one you want them from: SELECT t.id1, t.id2
Once you alias a source, you need to use the alias, not the base name. You can't do FROM test t WHERE test.id2. Because test is now t, the WHERE clause needs to use the alias WHERE t.id2.
A plain JOIN is by default an INNER JOIN.
Unless you really expect duplicates and wish to get rid of them, don't use UNION which defaults to UNION DISTINCT, just stick to UNION ALL.
TABLE is a shorthand form of SELECT * FROM, handy if you're just fetching entire sets.
&& checks if the two arrays overlap, which means [a,b]&&[c,d] checks if a=c OR a=d OR b=c OR b=d.
You can use a table name or alias as an expression/field/variable that will be a composite, a whole-row record. That's how I initially set up path, as an array of table test's records with the first one already in. Then, || keeps appending newly traversed edges (it's array concat/append operator, not to be confused with string concatenation).
NOT t=any(path) makes sure we're not visiting edges twice, by checking if they're already in the path we already followed.

To be clear: the minimum part is optional, it just hast to return the tree that the target vertex belongs to, and Kruskal/Prim just happen to be readily available implementations that can do that. They'd be faster if they moved completely arbitrarily, not trying to minimise weights. There's also pgr_ConnectedComponents() that just extracts trees, but it extracts all of them, so the time saved on not trying to minimise weights is still wasted, just elsewhere, on trees outside our interest.

Collectives™ on Stack Overflow

Postgres Recursive Query Behavior

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related