How can I SUM distinct records in a Postgres database where there are duplicate records?

Question

Imagine a table that looks like this:

table with duplicate data

The SQL to get this data was just SELECT * The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.

I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.

So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?

Thanks in advance!

meta.stackoverflow.com/questions/285551/…

user330315
– user330315

2016-12-15 18:40:44 +00:00
Commented Dec 15, 2016 at 18:40 — user330315
– user330315, Commented Dec 15, 2016 at 18:40

Bohemian · Accepted Answer · 2020-07-21 22:27:42Z

28

Easy - just divide by the count:

select id, sum(total) / count(id)
from orders
group by id

See live demo.

Also handles any level of duplication, eg triplicates etc.

edited Jul 21, 2020 at 22:27

answered Apr 10, 2016 at 1:31

Bohemian♦

427k103 gold badges603 silver badges750 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

rat Over a year ago

This is so clever!

chumakoff Over a year ago

This will not return the sum of total of all distinct records as required.

Bohemian Over a year ago

@sparky just to confirm, this is correct. Each id is independent of, and totally unrelated to, the frequency of another id. It doesn’t matter how many times an id appears. Since each id has only one distinct value, the total is n * total and count is n, so dividing gives total for all values of n (n can’t be zero, of course). If you aren’t convinced, post some sample data to support your belief.

OJFord Over a year ago

No, the result is the mean average of totals distinct on id, not the sum of totals distinct on id. For the sum, we'd need the division by count of ids inside the sum( ), but aggregates can't be nested.

OJFord Over a year ago

That just seems like an over-fit solution - we could also just return literals and prove it works for OP's data.

|

zedfoxus · Accepted Answer · 2016-04-10 01:04:15Z

7

You can try something like this (with your example):

Table

create table test (
  row_id int,
  id int,
  total decimal(15,2)
);

insert into test values 
(6395, 1509, 112), (22986, 1509, 112), 
(1393, 3284, 40.37), (24360, 3284, 40.37);

Query

with distinct_records as (
  select distinct id, total from test
)

select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
  on a.id = b.id
group by a.id, b.actual_total

Result

|   id | actual_total |    row_ids |
|------|--------------|------------|
| 1509 |          112 | 6395,22986 |
| 3284 |        40.37 | 1393,24360 |

Explanation

We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.

Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.

SQLFiddle example

http://sqlfiddle.com/#!15/72639/3

answered Apr 10, 2016 at 1:04

zedfoxus

37.4k5 gold badges68 silver badges66 bronze badges

1 Comment

Googie Over a year ago

This answer so undervalued, while it's the only one providing correct (and almost complete) solution to the question asked! To complete it it's worth to mention that in the second part of CTE (with actual select a.id....) the author should SELECT sum(b.actual_total) and skip group by entirely. That will give sum of total in all distinct (by id) rows.

Minu · Accepted Answer · 2020-08-17 21:13:19Z

5

Create custom aggregate:

CREATE OR REPLACE FUNCTION sum_func (
  double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';

CREATE AGGREGATE dist_sum (
  pg_catalog."any", 
  double precision)
(
  SFUNC = sum_func,
  STYPE = float8
);

And then calc distinct sum like:

select dist_sum(distinct id, total)
from orders

SQLFiddle

answered Aug 17, 2020 at 21:13

Minu

1943 silver badges8 bronze badges

Comments

Mike Kruk · Accepted Answer · 2017-07-10 14:49:46Z

2

You can use DISTINCT in your aggregate functions:

SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id

Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

answered Jul 10, 2017 at 14:49

Mike Kruk

735 bronze badges

2 Comments

Mike Kruk Over a year ago

I'm not going to delete this, but this is actually wrong. When using distinct inside of an aggregate function, it gets the distinct values of the column, so if you have any distinct orders with the same total, then your sum is going to be inaccurate. The sum(total) / count(id) wins this one.

Googie Over a year ago

The sum(total) / count(id) is also wrong. It basically returns arithmetic average of the total column. If you follow @Bohemian answer (that is - to group by id) you will get averages per each id - which is also not what the author was asking for.

PaulZi · Accepted Answer · 2019-06-25 09:43:39Z

2

In difficult cases:

select
  id,
  (
    SELECT SUM(value::int4)
    FROM jsonb_each_text(jsonb_object_agg(row_id, total))
  ) as total
from orders
group by id

answered Jun 25, 2019 at 9:43

PaulZi

712 bronze badges

Comments

Dmitry Krakosevich · Accepted Answer · 2024-01-13 11:30:56Z

1

I have a simpler and more elegant solution to this problem.

select id,
(select sum(value::integer) from jsonb_each(jsonb_object_agg(table.id, table.total)))
from table

answered Jan 13, 2024 at 11:30

Dmitry Krakosevich

3025 silver badges14 bronze badges

Comments

scottjustin5000 · Accepted Answer · 2016-04-10 01:12:31Z

0

If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:

CREATE TABLE test2 (id int, order_id int, total int);

insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);

select order_id, sum(total)
   from test2 t
   join (
     select max(id) as id
      from test2 
       group by order_id) as sq
  on t.id = sq.id
  group by order_id

sql fiddle

answered Apr 10, 2016 at 1:12

scottjustin5000

1,35612 silver badges10 bronze badges

Comments

Jaques Rheeder · Accepted Answer · 2019-06-25 10:31:56Z

0

I would suggest just use a sub-Query:

SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"

The Above will give you the total of each id

Use below if you want the full total of each duplicate removed:

SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"

answered Jun 25, 2019 at 10:31

Jaques Rheeder

231 silver badge7 bronze badges

Comments

Googie · Accepted Answer · 2021-05-25 22:45:45Z

0

Using subselect (http://sqlfiddle.com/#!7/cef1c/51):

select sum(total) from (
  select distinct id, total
  from orders
)

Using CTE (http://sqlfiddle.com/#!7/cef1c/53):

with distinct_records as (
  select distinct id, total from orders
)
select sum(total) from distinct_records;

answered May 25, 2021 at 22:45

Googie

6,1472 gold badges22 silver badges33 bronze badges

Collectives™ on Stack Overflow

How can I SUM distinct records in a Postgres database where there are duplicate records?

9 Answers 9

15 Comments

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

15 Comments

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related