18

Imagine a table that looks like this:

table with duplicate data

The SQL to get this data was just SELECT * The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.

I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.

So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?

Thanks in advance!

1

9 Answers 9

28

Easy - just divide by the count:

select id, sum(total) / count(id)
from orders
group by id

See live demo.

Also handles any level of duplication, eg triplicates etc.

Sign up to request clarification or add additional context in comments.

15 Comments

This is so clever!
This will not return the sum of total of all distinct records as required.
@sparky just to confirm, this is correct. Each id is independent of, and totally unrelated to, the frequency of another id. It doesn’t matter how many times an id appears. Since each id has only one distinct value, the total is n * total and count is n, so dividing gives total for all values of n (n can’t be zero, of course). If you aren’t convinced, post some sample data to support your belief.
No, the result is the mean average of totals distinct on id, not the sum of totals distinct on id. For the sum, we'd need the division by count of ids inside the sum( ), but aggregates can't be nested.
That just seems like an over-fit solution - we could also just return literals and prove it works for OP's data.
|
7

You can try something like this (with your example):

Table

create table test (
  row_id int,
  id int,
  total decimal(15,2)
);

insert into test values 
(6395, 1509, 112), (22986, 1509, 112), 
(1393, 3284, 40.37), (24360, 3284, 40.37);

Query

with distinct_records as (
  select distinct id, total from test
)

select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
  on a.id = b.id
group by a.id, b.actual_total

Result

|   id | actual_total |    row_ids |
|------|--------------|------------|
| 1509 |          112 | 6395,22986 |
| 3284 |        40.37 | 1393,24360 |

Explanation

We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.

Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.

SQLFiddle example

http://sqlfiddle.com/#!15/72639/3

1 Comment

This answer so undervalued, while it's the only one providing correct (and almost complete) solution to the question asked! To complete it it's worth to mention that in the second part of CTE (with actual select a.id....) the author should SELECT sum(b.actual_total) and skip group by entirely. That will give sum of total in all distinct (by id) rows.
5

Create custom aggregate:

CREATE OR REPLACE FUNCTION sum_func (
  double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';

CREATE AGGREGATE dist_sum (
  pg_catalog."any", 
  double precision)
(
  SFUNC = sum_func,
  STYPE = float8
);

And then calc distinct sum like:

select dist_sum(distinct id, total)
from orders

SQLFiddle

Comments

2

You can use DISTINCT in your aggregate functions:

SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id

Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

2 Comments

I'm not going to delete this, but this is actually wrong. When using distinct inside of an aggregate function, it gets the distinct values of the column, so if you have any distinct orders with the same total, then your sum is going to be inaccurate. The sum(total) / count(id) wins this one.
The sum(total) / count(id) is also wrong. It basically returns arithmetic average of the total column. If you follow @Bohemian answer (that is - to group by id) you will get averages per each id - which is also not what the author was asking for.
2

In difficult cases:

select
  id,
  (
    SELECT SUM(value::int4)
    FROM jsonb_each_text(jsonb_object_agg(row_id, total))
  ) as total
from orders
group by id

Comments

1

I have a simpler and more elegant solution to this problem.

select id,
(select sum(value::integer) from jsonb_each(jsonb_object_agg(table.id, table.total)))
from table

Comments

0

If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:

CREATE TABLE test2 (id int, order_id int, total int);

insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);

select order_id, sum(total)
   from test2 t
   join (
     select max(id) as id
      from test2 
       group by order_id) as sq
  on t.id = sq.id
  group by order_id 

sql fiddle

Comments

0

I would suggest just use a sub-Query:

SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"

The Above will give you the total of each id

Use below if you want the full total of each duplicate removed:

SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"

Comments

0

Using subselect (http://sqlfiddle.com/#!7/cef1c/51):

select sum(total) from (
  select distinct id, total
  from orders
)

Using CTE (http://sqlfiddle.com/#!7/cef1c/53):

with distinct_records as (
  select distinct id, total from orders
)
select sum(total) from distinct_records;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.