90

I don't know quite how to phrase this so please help me with the title as well. :)

I have two tables. Let's call them A and B. The B table has a a_id foreign key that points at A.id. Now I would like to write a SELECT statement that fetches all A records, with an additional column containing the count of B records per A row for each row in the result set.

I'm using Postgresql 9 right now, but I guess this would be a generic SQL question?

EDIT:

In the end I went for trigger-cache solution, where A.b_count is updated via a function each time B changes.

1
  • It might be better to use a JOIN for performance reasons. Commented Dec 26, 2010 at 23:45

6 Answers 6

143
SELECT A.*, (SELECT COUNT(*) FROM B WHERE B.a_id = A.id) AS TOT FROM A
Sign up to request clarification or add additional context in comments.

6 Comments

Does this type of nested select have a performance penalty worth worrying about?
Yes, there is. The nested select will be executed for each row that is retrieved from table A.
Hm, so I'm guessing that it'd be much more efficient to create a column in A table, and update the value with a trigger when B table is modified?
Keeping a cached column would be the most efficient for reading but introduces it's problems during updates. The most efficient way of doing this without a cache column is to use a JOIN and GROUP BY (see my answer as an example.)
This solution is inefficient (slow).
|
40

I think the comment by @intgr in another answer is so valuable I'm putting forward this as an alternate answer as this method allows you to filter the calculated column efficiently.

SELECT
  a.*,
  COUNT(b.id) AS b_count

FROM a
INNER JOIN b on b.a_id = a.id
WHERE a.id > 50 AND b.ID < 100 -- example of filtering joined tables, optional

GROUP BY a.id
HAVING COUNT(b.id) > 10 -- example of filtering calculated column, optional
ORDER BY a.id

1 Comment

For people looking to filter with count 0 as in HAVING COUNT(b.id) = 0, use LEFT OUTER JOIN and it will work.
15

The subquery solution given above is inefficient. The trigger solution is probably best in a mostly-read database, but for the record here's a join approach that will perform better than a subquery:

SELECT a.id, a.xxx, count(*)
FROM a JOIN b ON (b.a_id = a.id)
GROUP BY a.id, a.xxx

If you're using Django ORM you can simply write:

res = A.objects.annotate(Count('b'))
print res[0].b__count  # holds the result count

4 Comments

Hm, there seems to be many ways to do this. :) I've implmented the triggers, and since this is mostly-read part of the application (it's a listing of directory-type items with item count per directory record on the dashboard), I think it's the safest bet.
what if you have dozens of column in a?
Since PostgreSQL 9.1 it's enough to do "GROUP BY primary_key_column", in earlier versions you'd have to name all chosen columns in the GROUP BY.
In case of 1 to 0...m relation, the count(*) will return 1 even if there are zero FK rows. Count(b.id) would be correct.
13

Accepted answer is inefficient (slow) based on my tests. The subquery of table B executing for every row of table A. I'm using following approach based on grouping and joining. It works much faster:

SELECT A.id, QTY.quantity FROM A
LEFT JOIN
    (SELECT COUNT(B.a_id) AS quantity, B.a_id FROM B GROUP BY B.a_id) AS QTY
ON A.id = QTY.a_id

Another variant:

SELECT A.id, COUNT(B.a_id) AS quantity FROM A
LEFT JOIN B ON B.a_id = A.id
GROUP BY A.id

2 Comments

Note: Only the first variant allows to select different counts from multiple tables. The second variant will return a product of all the counts if more than one LEFT JOIN / COUNT is added.
If you're running into an issue where you want to avoid [null] values returning in your output, use COALESCE(QTY.quantity) (instead of trying to coalesce from within the left join, like I was. oops!)
2

To answer my own question:

SELECT a.id, a.other_column, ..., 
(SELECT COUNT(*) FROM b where b.a_id = a.id) AS b_count
FROM a;

Comments

0

Whilst a sub-query may be less efficient, how much less efficient depends on the use-case. Another thing to consider is the filters that are being used.

I have a Table A of "Approvers" I have a Table B of "Approval tasks"

I want to show a list of ALL approvers along with a count of how many ACTIVE approval tasks they have. Now, my knowledge of SQL is limited, but no matter what I tried with the different types of join, my list of approvers was incomplete. Why? I need to have a filter on table B so that only active tasks are returned. If an approver only has inactive/complete tasks, there is no count. This should show 0, but for some reason it just doesn't show the row at all.

So, I use a sub-query and it works perfectly.

2 Comments

Hello Gryzor. Welcome to StackOverflow. Could you please clarify your answer with a code example explaining what you wrote? Also, please confirm that your answer is substantively different than the five answers already posted.
It doesn't need a code example as plenty have been provided and mine is no different. I'm just saying that the correct solution depends. All of the above join examples provided don't work if a row in table A doesn't have any corresponding rows in table B. Table A can have 10 reviewers, but if one of those reviewers has no "active tasks" in table B, the query will only return 9 rows. This might be fine for some purposes, but for my own where I still need to return all 10 approvers regardless, joins don't work. Hence why sometimes a sub-query is fine, even if not the most optimal.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.