Postgres group by columns and within group select other columns by max aggregate

Question

This is probably a standard problem, and I've keyed off some other greatest-n-per-group answers, but so far been unable to resolve my current problem.

A              B             C
+----+-------+ +----+------+ +----+------+-------+
| id | start | | id | a_id | | id | b_id | name  |
+----+-------+ +----+------+ +----+------+-------+
|  1 |     1 | |  1 |    1 | |  1 |    1 | aname |
|  2 |     2 | |  2 |    1 | |  2 |    2 | aname |
+----+-------+ |  3 |    2 | |  3 |    3 | aname |
               +----+------+ |  4 |    3 | bname |
                             +----+------+-------+

In English what I'd like to accomplish is:

For each c.name, select its newest entry based on the start time in a.start

The SQL I've tried is the following:

SELECT a.id, a.start, c.id, c.name 
FROM a
INNER JOIN (
    SELECT id, MAX(start) as start
    FROM a
    GROUP BY id
) a2 ON a.id = a2.id AND a.start = a2.start
JOIN b
ON a.id = b.a_id
JOIN c
on b.id = c.b_id
GROUP BY c.name;

It fails with errors such as:

ERROR: column "a.id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8

To be useful I really need the ids from the query, but cannot group on them since they are unique. Here is an example of output I'd love for the first case above:

+------+---------+------+--------+
| a.id | a.start | c.id | c.name |
+------+---------+------+--------+
|    2 |       2 |    3 | aname  |
|    2 |       2 |    4 | bname  |
+------+---------+------+--------+

Here is a Sqlfiddle

Edit - removed second case

GROUP BY c.name; is not required.

Vamsi Prabhala
– Vamsi Prabhala

2016-07-11 18:43:34 +00:00
Commented Jul 11, 2016 at 18:43 — Vamsi Prabhala
– Vamsi Prabhala, Commented Jul 11, 2016 at 18:43

Clodoaldo Neto · Accepted Answer · 2016-07-11 19:02:35Z

5

Case 1

select distinct on (c.name)
    a.id, a.start, c.id, c.name
from
    a
    inner join
    b on a.id = b.a_id
    inner join
    c on b.id = c.b_id
order by c.name, a.start desc
;
 id | start | id | name  
----+-------+----+-------
  2 |     2 |  3 | aname
  2 |     2 |  4 | bname

Case 2

select distinct on (c.name)
    a.id, a.start, c.id, c.name
from
    a
    inner join
    b on a.id = b.a_id
    inner join
    c on b.id = c.b_id
where
    b.a_id in (
        select a_id
        from b
        group by a_id
        having count(*) > 1
    )
order by c.name, a.start desc
;
 id | start | id | name  
----+-------+----+-------
  1 |     1 |  1 | aname

edited Jul 11, 2016 at 19:02

answered Jul 11, 2016 at 18:54

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

David E Over a year ago

Thanks for such a fast answer! If I need to extend the distinct to additional columns in c, I'm assuming I just append it to the distinct statement, but also inside the order by? Also, I'm guessing the performance of this is going to get bad pretty quickly as overall row count of the join goes up due to multiple sorts?

Clodoaldo Neto Over a year ago

@DavidE You can add items to the order by clause in addition to the obligatory c.name and the untier a.start. The select list is free. Check explain analyze

Collectives™ on Stack Overflow

Postgres group by columns and within group select other columns by max aggregate

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related