4

I have been struggling to optimize a recursive call done purely in ruby. I have moved the data onto a postgresql database, and I would like to make use of the WITH RECURSIVE function that postgresql offers.

The examples that I could find all seems to use a single table, such as a menu or a categories table.

My situation is slightly different. I have a questions and an answers table.

+----------------------+     +------------------+
| questions            |     | answers          |
+----------------------+     +------------------+
| id                   |     | source_id        | <- from question ID
| start_node (boolean) |     | target_id        | <- to question ID
| end_node (boolean)   |     +------------------+
+----------------------+

I would like to fetch all questions that's connected together by the related answers.

I would also like to be able to go the other way in the tree, e.g from any given node to the root node in the tree.

To give another example of a question-answer tree in a graphical way:

Q1
 |-- A1
 |    '-- Q2
 |         |-- A2
 |         |    '-- Q3
 |         '-- A3
 |             '-- Q4
 '-- A4
      '-- Q5

As you can see, a question can have multiple outgoing questions, but they can also have multiple incoming answers -- any-to-many.

I hope that someone has a good idea, or can point me to some examples, articles or guides.

Thanks in advance, everybody.

Regards, Emil

2
  • This does not seem recursive. Can a question be an answer to a question? (meta question, this is ...) Commented Sep 19, 2016 at 22:08
  • An answer is an answer to a question, but an answer leads to another question, and so on. So the chain could be infinitely long. Answers could lead back onto questions passed earlier in the chain. I hope this answers the question? Commented Sep 20, 2016 at 18:46

2 Answers 2

3

This is far, far from ideal but I would play around recursive query over joins, like that:

WITH RECURSIVE questions_with_answers AS (
    SELECT 
        q.*, a.* 
    FROM 
        questions q 
    LEFT OUTER JOIN 
        answers a ON (q.id = a.source_id)

    UNION ALL

    SELECT 
        q.*, a.*
    FROM 
        questions_with_answers qa 
    JOIN 
        questions q ON (qa.target_id = q.id) 
    LEFT OUTER JOIN 
        answers a ON (q.id = a.source_id) 
)
SELECT * FROM questions_with_answers WHERE source_id IS NOT NULL AND target_id IS NOT NULL;

Which gives me result:

 id | name | start_node | end_node | source_id | target_id 
----+------+------------+----------+-----------+-----------
  1 | Q1   |            |          |         1 |         2
  2 | A1   |            |          |         2 |         3
  3 | Q2   |            |          |         3 |         4
  3 | Q2   |            |          |         3 |         6
  4 | A2   |            |          |         4 |         5
  6 | A3   |            |          |         6 |         7
  1 | Q1   |            |          |         1 |         8
  8 | A4   |            |          |         8 |         9
  2 | A1   |            |          |         2 |         3
  3 | Q2   |            |          |         3 |         6
  3 | Q2   |            |          |         3 |         4
  4 | A2   |            |          |         4 |         5
  6 | A3   |            |          |         6 |         7
  8 | A4   |            |          |         8 |         9
  3 | Q2   |            |          |         3 |         6
  3 | Q2   |            |          |         3 |         4
  6 | A3   |            |          |         6 |         7
  4 | A2   |            |          |         4 |         5
  6 | A3   |            |          |         6 |         7
  4 | A2   |            |          |         4 |         5
(20 rows)
Sign up to request clarification or add additional context in comments.

3 Comments

Hi. Thanks a lot(!) for this answer. It has been really great! I tried tweaking it a bit, but I can't seem to get the last question in the chain to be included in the resulting table. I have created a gist here: gist.github.com/ekampp/cb871918adc0fc931cb02afa20c30651
@EmilKampp, I think it is filtered because of NOT NULL condition for source_id and target_id. Try to change it to WHERE source_id IS NOT NULL OR target_id IS NOT NULL;
Yes, that's also what I arrived at. Thanks a bunch for your help! :D
0

In fact you do not need two tables. I would like to encourage you to analyse this example. Maintaining one table instead of two will save you a lot of trouble, especially when it comes to recursive queries.

This minimal structure contains all the necessary information:

create table the_table (id int primary key, parent_id int);
insert into the_table values
(1, 0),  -- root question
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 1),
(7, 3),
(8, 0),  -- root question
(9, 8);

Whether the node is a question or an answer depends on its position in the tree. Of course, you can add a column with information about the type of node to the table.

Use this query to get answer for both your requests (uncomment adequate where condition):

with recursive cte(id, parent_id, depth, type, root) as (
    select id, parent_id, 1, 'Q', id
    from the_table
    where parent_id = 0
    -- and id = 1   <-- looking for list of a&q for root question #1
union all
    select 
        t.id, t.parent_id, depth+ 1, 
        case when (depth & 1)::boolean then 'A' else 'Q' end, c.root
    from cte c
    join the_table t on t.parent_id = c.id
)
select *
from cte
-- where id = 9   <-- looking for root question for answer #9
order by id;

 id | parent_id | depth | type | root 
----+-----------+-------+------+------
  1 |         0 |     1 | Q    |    1
  2 |         1 |     2 | A    |    1
  3 |         1 |     2 | A    |    1
  4 |         2 |     3 | Q    |    1
  5 |         2 |     3 | Q    |    1
  6 |         1 |     2 | A    |    1
  7 |         3 |     3 | Q    |    1
  8 |         0 |     1 | Q    |    8
  9 |         8 |     2 | A    |    8
(9 rows)

The relationship child - parent is unambiguous and applies to both sides. There is no need to store this information twice. In other words, if we store information about parents, the information about children is redundant (and vice versa). It is one of the fundamental properties of the data structure called tree. See the examples:

-- find parent of node #6
select parent_id 
from the_table 
where id = 6; 

-- find children of node #6
select id 
from the_table 
where parent_id = 6;

1 Comment

I do believe that I need two tables, because the answers contains information about the connection between to questions that would otherwise be lost?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.