1

I have a table test with two columns, each referencing the other

ID1 ID2
2 1
3 2
4 3
5 4

Given an id, I need to traverse the chain to return all the rows traversing the chain. Below is the query I have written:

WITH RECURSIVE 
-- backward 
    backward_loop as (
    SELECT id1, id2 
    FROM test t
    WHERE test.id2 = '<id>'
    UNION
    SELECT id1, id2 
    FROM
        test t
        INNER JOIN backward_loop b ON (b.id1 = t.id2)
    ),
     -- forward 
    forward_loop as (
    SELECT id1, id2
    FROM test t
    WHERE test.id1 = '<id>'
    UNION
    SELECT id1, id2
    FROM
        test t
        INNER JOIN forward_loop f ON (f.id2 = t.id1)
)   
SELECT id1, id2
FROM backward_loop bl
UNION ALL
SELECT id1, id2
FROM forward_loop fl;

E.g. if given 2, I need to return all the rows (2,1), (3,2), (4,3) and (5,4). The query works fine and returns all the rows correctly.

However, when I have the below data, where one of the ids (50) has multiple mappings:

ID1 ID2
20 10
30 20
40 30
50 40
50 60

and when I run the query passing 20, I get only (20,10), (30,20), (40,30) and (50,40), but not the row (50,60). When I send in 50, I get back all the rows, but it's an issue when I input any other id.

Any idea how I can get the row (50,60) as well?

9
  • Please use the code formatting option for code, and table markdown for data. Commented 2 days ago
  • And I am interested to know why you haven't accepted an answer to an of your previously asked questions? That doesn't encourage people to assist you. Commented 2 days ago
  • 1
    And please stop signing your questions with your initials... that just noise to future readers Commented 2 days ago
  • And you only use the "database" tag when you haven't tagged a specific RDBMS (as per the tag guide to usage) Commented 2 days ago
  • 1
    Also is it guaranteed that there are no loops? Commented 2 days ago

1 Answer 1

3

You're looking for a minimum spanning tree. Aside from minor logical and syntactical problems in your current code, you made it so that it's able to go up from 50 to both 40 and 60, but not down from 20 into 50 and then back up to 60.

Instead of reinventing the wheel, you can simply use an implementation of Kruskal algorithm from highly optimised Boost C++ library, made available via pgrouting extension:
demo at db<>fiddle

create extension pgrouting cascade;

prepare pgrouting_kruskal_dfs(int) as 
select pred as id1
     , node as id2 
from pgr_kruskalDFS(
  $x$ select row_number()over() as id
           , id1 as source
           , id2 as target
           , 1   as cost
      from test 
  $x$, $1)
where depth<>0;

execute pgrouting_kruskal_dfs(20);
id1 id2
20 10
20 30
30 40
40 50
50 60

To maintain the original layout of IDs, you need to join back to the table because to pgrouting those are just vertex identifiers in an undirected graph, making (50,60) and (60,50) equivalent:

prepare kruskal_original_id_layout(int)as
select t.id1,t.id2
from pgr_kruskalDFS(
  $x$ select row_number()over() as id
           , id1 as source
           , id2 as target
           , 1   as cost
      from test 
  $x$, $1)p(seq,depth,start_vid,id1,id2)
join test as t
on least(t.id1,t.id2)=least(p.id1,p.id2)
and greatest(t.id1,t.id2)=greatest(p.id1,p.id2)
and depth<>0;

execute kruskal_original_id_layout(20);
id1 id2
20 10
30 20
40 30
50 40
50 60

This problem is somewhat related to the one in your approach. If you update your hierarchy to use canonicalised format, e.g. make sure lower ID comes first, it'll work.

update test 
  set id1=id2
     ,id2=id1
  where id2>id1;

execute op_query(20);
id1 id2
30 20
40 30
50 40
60 50
20 10

If these relationships aren't commutative, you can't flip them but you're free to traverse in both directions, you can reduce your two CTEs to a single one running bidirectional search and keeping track of edges it already traversed. It's able to go up, or down, or both, at any time, so it's capable of diving from 20 to 50, then back up to 60:

PREPARE op_query2(int) AS
WITH RECURSIVE traversal AS (
    SELECT id1, id2, array[test] path
    FROM test
    WHERE id2 = $1
    UNION ALL
    SELECT t.id1, t.id2, b.path||t
    FROM test AS t
    JOIN traversal AS b 
      ON array[b.id1, b.id2] && array[t.id1, t.id2]
     AND NOT t=any(path)
)TABLE traversal;

execute op_query2(20);
id1 id2 path
30 20 {"(30,20)"}
20 10 {"(30,20)","(20,10)"}
40 30 {"(30,20)","(40,30)"}
50 40 {"(30,20)","(40,30)","(50,40)"}
50 60 {"(30,20)","(40,30)","(50,40)","(50,60)"}
  1. SELECT id1, id2 FROM test t INNER JOIN backward_loop b is ambiguous because both t and b have those columns. You need to specify which one you want them from: SELECT t.id1, t.id2
  2. Once you alias a source, you need to use the alias, not the base name. You can't do FROM test t WHERE test.id2. Because test is now t, the WHERE clause needs to use the alias WHERE t.id2.
  3. A plain JOIN is by default an INNER JOIN.
  4. Unless you really expect duplicates and wish to get rid of them, don't use UNION which defaults to UNION DISTINCT, just stick to UNION ALL.
  5. TABLE is a shorthand form of SELECT * FROM, handy if you're just fetching entire sets.
  6. && checks if the two arrays overlap, which means [a,b]&&[c,d] checks if a=c OR a=d OR b=c OR b=d.
  7. You can use a table name or alias as an expression/field/variable that will be a composite, a whole-row record. That's how I initially set up path, as an array of table test's records with the first one already in. Then, || keeps appending newly traversed edges (it's array concat/append operator, not to be confused with string concatenation).
  8. NOT t=any(path) makes sure we're not visiting edges twice, by checking if they're already in the path we already followed.
Sign up to request clarification or add additional context in comments.

2 Comments

To be clear: the minimum part is optional, it just hast to return the tree that the target vertex belongs to, and Kruskal/Prim just happen to be readily available implementations that can do that. They'd be faster if they moved completely arbitrarily, not trying to minimise weights. There's also pgr_ConnectedComponents() that just extracts trees, but it extracts all of them, so the time saved on not trying to minimise weights is still wasted, just elsewhere, on trees outside our interest.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.