You're looking for a minimum spanning tree. Aside from minor logical and syntactical problems in your current code, you made it so that it's able to go up from 50 to both 40 and 60, but not down from 20 into 50 and then back up to 60.
Instead of reinventing the wheel, you can simply use an implementation of Kruskal algorithm from highly optimised Boost C++ library, made available via pgrouting extension:
demo at db<>fiddle
create extension pgrouting cascade;
prepare pgrouting_kruskal_dfs(int) as
select pred as id1
, node as id2
from pgr_kruskalDFS(
$x$ select row_number()over() as id
, id1 as source
, id2 as target
, 1 as cost
from test
$x$, $1)
where depth<>0;
execute pgrouting_kruskal_dfs(20);
| id1 |
id2 |
| 20 |
10 |
| 20 |
30 |
| 30 |
40 |
| 40 |
50 |
| 50 |
60 |
To maintain the original layout of IDs, you need to join back to the table because to pgrouting those are just vertex identifiers in an undirected graph, making (50,60) and (60,50) equivalent:
prepare kruskal_original_id_layout(int)as
select t.id1,t.id2
from pgr_kruskalDFS(
$x$ select row_number()over() as id
, id1 as source
, id2 as target
, 1 as cost
from test
$x$, $1)p(seq,depth,start_vid,id1,id2)
join test as t
on least(t.id1,t.id2)=least(p.id1,p.id2)
and greatest(t.id1,t.id2)=greatest(p.id1,p.id2)
and depth<>0;
execute kruskal_original_id_layout(20);
| id1 |
id2 |
| 20 |
10 |
| 30 |
20 |
| 40 |
30 |
| 50 |
40 |
| 50 |
60 |
This problem is somewhat related to the one in your approach. If you update your hierarchy to use canonicalised format, e.g. make sure lower ID comes first, it'll work.
update test
set id1=id2
,id2=id1
where id2>id1;
execute op_query(20);
| id1 |
id2 |
| 30 |
20 |
| 40 |
30 |
| 50 |
40 |
| 60 |
50 |
| 20 |
10 |
If these relationships aren't commutative, you can't flip them but you're free to traverse in both directions, you can reduce your two CTEs to a single one running bidirectional search and keeping track of edges it already traversed. It's able to go up, or down, or both, at any time, so it's capable of diving from 20 to 50, then back up to 60:
PREPARE op_query2(int) AS
WITH RECURSIVE traversal AS (
SELECT id1, id2, array[test] path
FROM test
WHERE id2 = $1
UNION ALL
SELECT t.id1, t.id2, b.path||t
FROM test AS t
JOIN traversal AS b
ON array[b.id1, b.id2] && array[t.id1, t.id2]
AND NOT t=any(path)
)TABLE traversal;
execute op_query2(20);
| id1 |
id2 |
path |
| 30 |
20 |
{"(30,20)"} |
| 20 |
10 |
{"(30,20)","(20,10)"} |
| 40 |
30 |
{"(30,20)","(40,30)"} |
| 50 |
40 |
{"(30,20)","(40,30)","(50,40)"} |
| 50 |
60 |
{"(30,20)","(40,30)","(50,40)","(50,60)"} |
SELECT id1, id2 FROM test t INNER JOIN backward_loop b is ambiguous because both t and b have those columns. You need to specify which one you want them from: SELECT t.id1, t.id2
- Once you alias a source, you need to use the alias, not the base name. You can't do
FROM test t WHERE test.id2. Because test is now t, the WHERE clause needs to use the alias WHERE t.id2.
- A plain
JOIN is by default an INNER JOIN.
- Unless you really expect duplicates and wish to get rid of them, don't use
UNION which defaults to UNION DISTINCT, just stick to UNION ALL.
TABLE is a shorthand form of SELECT * FROM, handy if you're just fetching entire sets.
&& checks if the two arrays overlap, which means [a,b]&&[c,d] checks if a=c OR a=d OR b=c OR b=d.
- You can use a table name or alias as an expression/field/variable that will be a composite, a whole-row record. That's how I initially set up
path, as an array of table test's records with the first one already in. Then, || keeps appending newly traversed edges (it's array concat/append operator, not to be confused with string concatenation).
NOT t=any(path) makes sure we're not visiting edges twice, by checking if they're already in the path we already followed.