16

My table in postgres looks like below, Table stores a chain sort of relation between IDs and I want to have a query which can produce the result like "vc1" -> "rc7" or "vc3"->"rc7", I will only query on the IDs in first column ID1

ID1     ID2
"vc1"   "vc2"
"vc2"   "vc3"
"vc3"   "vc4"
"vc4"   "rc7"

So I want to supply some "head" id here for which I have to fetch the tail(last in the chain) id.

5
  • Well: just join them resursively and anchor/restrict the head and tail to the {vc1,rc7} or {vc3,rc7} begin/end-conditions. BTW: is is not clear what you want, do you want the full path from {vc1,rc7}, or just a boolean that indicates the existence of such a path ? Commented Jun 23, 2013 at 14:37
  • Hi @wildplasser! thanks for the direction, but I wish I could make full sense out of that. I am not skilled with sql. I cant understand the part anchor/restrict head or tail. I forgot to mention in the question but I will only be querying on the basis of some head id only. Commented Jun 23, 2013 at 14:40
  • @wildplasser And I do not want full path, I want to supply the head and want to fetch the tail. Commented Jun 23, 2013 at 14:41
  • 1
    Given the head, there may be more than one tail. In fact a whole tree. The question is not very clear (and shows no effort at all) So your data is guaranteed to contain a linear and no loops ? Commented Jun 23, 2013 at 14:43
  • My bad to hurry in posting the question and not putting in much detail, by the design of application where it is used, more than one tail is not possible. So that part is assured. Commented Jun 23, 2013 at 14:47

2 Answers 2

44

This is a classic use of a simple recursive common table expression (WITH RECURSIVE), available in PostgreSQL 8.4 and later.

Demonstrated here: http://sqlfiddle.com/#!12/78e15/9

Given the sample data as SQL:

CREATE TABLE Table1
    ("ID1" text, "ID2" text)
;

INSERT INTO Table1
    ("ID1", "ID2")
VALUES
    ('vc1', 'vc2'),
    ('vc2', 'vc3'),
    ('vc3', 'vc4'),
    ('vc4', 'rc7')
;

You could write:

WITH RECURSIVE chain(from_id, to_id) AS (
  SELECT NULL, 'vc2'
  UNION
  SELECT c.to_id, t."ID2"
  FROM chain c
  LEFT OUTER JOIN Table1 t ON (t."ID1" = to_id)
  WHERE c.to_id IS NOT NULL
)
SELECT from_id FROM chain WHERE to_id IS NULL;

What this does is iteratively walk the chain, adding each row to the chain table as from- and to-pointers. When it encounters a row for which the 'to' reference doesn't exist it will add a null 'to' reference for that row. The next iteration will notice that the 'to' reference is null and produce zero rows, which causes the iteration to end.

The outer query then picks up rows that've been determined to be the end of the chain by having a non-existent to_id.

It takes a bit of effort to get your head around recursive CTEs. They key things to understand are:

  • They start with the output of an initial query, which they repeatedly union with the output of the "recursive part" (the query after the UNION or UNION ALL) until the recursive part adds no rows. That stops iteration.

  • They aren't really recursive, more iterative, though they're good for the sorts of things you might use recursion for.

So you're basically building a table in a loop. You can't delete rows or change them, only add new ones, so you generally need an outer query that filters the results to get the result rows you want. You'll often add extra columns containing intermediate data that you use to track the state of the iteration, control stop-conditions, etc.

It can help to look at the unfiltered result. If I replace the final summary query with a simple SELECT * FROM chain I can see the table that's been generated:

 from_id | to_id 
---------+-------
         | vc2
 vc2     | vc3
 vc3     | vc4
 vc4     | rc7
 rc7     | 
(5 rows)

The first row is the manually added starting point row, where you specify what you want to look up - in this case that was vc2. Each subsequent row was added by the UNIONed recursive term, which does a LEFT OUTER JOIN on the previous result and returns a new set of rows that pair up the previous to_id (now in the from_id column) to the next to_id. If the LEFT OUTER JOIN doesn't match then the to_id will be null, causing the next invocation to return now rows and end iteration.

Because this query doesn't attempt to add only the last row each time, it's actually repeating a fair bit of work each iteration. To avoid that you would need to use an approach more like Gordon's, but additionally filter on the previous depth field when you scanned input table, so you joined only the most recent row. In practice this usually isn't necessary, but it can be a concern for very big data sets or where you can't create appropriate indexes.

More can be learned in the PostgreSQL documentation on CTEs.

Sign up to request clarification or add additional context in comments.

1 Comment

I thought this was a pretty good explanation of the same thing. postgresqltutorial.com/postgresql-recursive-query
31

Here is the SQL using a recursive CTE:

with recursive tr(id1, id2, level) as (
      select t.id1, t.id2, 1 as level
      from t union all
      select t.id1, tr.id2, tr.level + 1
      from t join
           tr
           on t.id2 = tr.id1
     )
select *
from (select tr.*,
             max(level) over (partition by id1) as maxlevel
      from tr
     ) tr
where level = maxlevel;

Here is the SQLFiddle

8 Comments

just exactly what I wanted! Thanks. never used WITH RECURSIVE/CTEs in the past.
Argh... Stack Overflow didn't bother loading your answer until I'd mostly finished mine, even though it was 8 hours ago. Posted anyway, as it can be a lot simpler.
@david_adler . . . You just have to select another version of Postgres: sqlfiddle.com/#!15/11833/1.
Is there anyway to avoid iterating the children of a child that has already been seen/marked?
@JustinFuruness . . . join means inner join and it is easier to type.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.