0

I'm making a query with having multiple non aggregated columns with group by clause but Postgres is throwing an error that I have to add non aggregated columns in group by or use any aggregate function on that column this is the query that I'm trying to run.

select 
    tb1.pipeline as pipeline_id, 
    tb3.pipeline_name as pipeline_name, 
    tb2."name" as integration_name,
    cast(tb1.integration_id as VARCHAR) as integration_id,
    tb1.created_at as created_at, 
    cast(tb1.id as VARCHAR) as batch_id,
    sum(tb1.row_select) as row_select, 
    sum(tb1.row_insert) as row_insert, 
from 
    table1 tb1
join 
    table2 tb2 on tb1.integration_id = tb2.id 
join 
    table3 tb3 on tb1.pipeline = tb3.id 
where 
    tb1.pipeline is not null 
    and tb1.is_super_parent = false 
group by 
    tb1.pipeline

and I found one solution/hack for this error that is I added max function in all other non aggregated columns this solves my problem.

select 
    tb1.pipeline as pipeline_id, 
    max(tb3.pipeline_name) as pipeline_name, 
    max(tb2."name") as integration_name, 
    max(cast(tb1.integration_id as VARCHAR)) as integration_id,
    max(tb1.created_at) as created_at, 
    max(cast(tb1.id as VARCHAR)) as batch_id,
    sum(tb1.row_select) as row_select, 
    sum(tb1.row_insert) as row_insert, 
from 
    table1 tb1
join 
    table2 tb2 on tb1.integration_id = tb2.id 
join 
    table3 tb3 on tb1.pipeline = tb3.id 
where 
    tb1.pipeline is not null 
    and tb1.is_super_parent = false 
group by 
    tb1.pipeline

But I don't want to add max functions when there is no need for that second thing is that applying max to all other column query will be expensive so any other better approach that I can do to solve the above issue, thanks in advance.

7
  • What PostgreSQL version are you using? Your SQL's does not have "FROM" clause, so I guess they are incomplete. Commented Nov 29, 2021 at 14:29
  • I have updated the question i missed from clause earlier, and my PostgreSQL version is 13.5 Commented Nov 29, 2021 at 14:37
  • max to all other column query will be expensive No, it wil not. (assuming the MAXed items are actually functionally dependent on the GROUP BY columns) Commented Nov 29, 2021 at 14:42
  • "But i don't want to add max functions when there is no need for that" But as the error message indicates, there is a need for that, so then you are OK with doing it, right? Commented Nov 29, 2021 at 18:49
  • Would you provide a create script for your tb1-tb3? As I understand tb3 is a "pipeline" table. That is the meaning of tb1 and tb2 ? PG allows allows non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause since 9.1. The question is why tb2 columns are not in GROUP BY in your query. Commented Nov 30, 2021 at 7:35

1 Answer 1

1

Well the first thing you need is to learn to format your queries in so as to get an idea of their flow at a glance. Note due to the extra comma in row_insert, from your query will give a syntax error. With that said; How do you solve your issue?
You cannot avoid the additional aggregates or the expanded group by as long as the exist in the scope same query. You need to separate the aggregation from selection of additional columns. You basically have 2 choices:

  1. Perform the aggregation in a CTE.
    with sums (pipeline_id, row_select, row_insert)  as
         ( select tb1.pipeline
                , sum(tb1.row_select) as row_select
                , sum(tb1.row_insert) as row_insert
           table1 tb1
           where tb1.pipeline is not null 
             and tb1.is_super_parent = false 
           group by tb1.pipeline 
         )
    select s.pipeline_id
         , tbl3.pipeline_name
         , tb2."name" integration_name 
         , s.row_select
         , s.row_insert 
      from sums s
      join table2 tbl2 on (s.pipeline_id = tb2.id)
      join table3 tbl3 on (s.pipeline_id = tb3.id);
  1. Perform the aggregation in a sub-query.
    select s.pipeline_id
         , tbl3.pipeline_name
         , tb2."name" integration_name
         , s.row_select
         , s.row_insert 
      from ( select tb1.pipeline
                , sum(tb1.row_select) as row_select
                , sum(tb1.row_insert) as row_insert
           table1 tb1
           where tb1.pipeline is not null 
             and tb1.is_super_parent = false 
           group by tb1.pipeline 
         ) s
      join table2 tbl2 on (s.pipeline_id = tb2.id)
      join table3 tbl3 on (s.pipeline_id = tb3.id);

NOTE: Not tested as no sample data supplied.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer what if I need to select more non aggregated columns, i have also updated query in my question?
As long as the aggregated columns are isolated into a CTE or a sub-select you can have as many non-aggregated as you wish. Actually, as long as they isolated you can have multiple CTEs or sub-selects with aggregation.
The queries provided are OK, but you have to be careful if you plan on filtering. If you add WHERE pipeline_name = 'pipe1' the execution of the query will still involve counting all row_selects and row_inserts for every pipeline. The performance will soon become unacceptable (assuming your tb1 will grow to ~1 million records) .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.