15

I have a table with arrays as one column, and I want to sum the array elements together:

> create table regres(a int[] not null);
> insert into regres values ('{1,2,3}'), ('{9, 12, 13}');
> select * from regres;
     a
-----------
 {1,2,3}
 {9,12,13}

I want the result to be:

{10, 14, 16}

that is: {1 + 9, 2 + 12, 3 + 13}.

Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function does not already exist.

The arrays are expected to be between 24 and 31 elements in length, all elements are NOT NULL, and the arrays themselves will also always be NOT NULL. All elements are basic int. There will be more than two rows per aggregate. All arrays will have the same number of elements, in a query. Different queries will have different number of elements.

My implementation target is: PostgreSQL 9.1.13

3
  • 1
    Also posted to pgsql-general mailing list: Pairwise array sum aggregate function? Commented Jul 28, 2014 at 14:10
  • 1
    If you can live with some excessive typing: select array[sum(a[1]), sum(a[2]), sum(a[3]), sum(a[4]), ....] from regres; Commented Jul 28, 2014 at 14:17
  • @a_horse_with_no_name I hadn't thought of that. Excellent thinking outside of the box. I generate the query, so it's not really a problem to write such a select clause. Commented Jul 28, 2014 at 14:25

3 Answers 3

18

General solutions for any number of arrays with any number of elements. Individual elements or the the whole array can be NULL, too:

Simpler in 9.4+ using WITH ORDINALITY

SELECT ARRAY (
   SELECT sum(elem)
   FROM  tbl t
       , unnest(t.arr) WITH ORDINALITY x(elem, rn)
   GROUP BY rn
   ORDER BY rn
   );

See:

Postgres 9.3+

This makes use of an implicit LATERAL JOIN

SELECT ARRAY (
   SELECT sum(arr[rn])
   FROM   tbl t
        , generate_subscripts(t.arr, 1) AS rn
   GROUP  BY rn
   ORDER  BY rn
   );

See:

Postgres 9.1

SELECT ARRAY (
   SELECT sum(arr[rn])
   FROM  (
      SELECT arr, generate_subscripts(arr, 1) AS rn
      FROM   tbl t
      ) sub
   GROUP  BY rn
   ORDER  BY rn
   );

The same works in later versions, but set-returning functions in the SELECT list are not standard SQL and were frowned upon by some. Should be OK since Postgres 10, though. See:

db<>fiddle here
Old sqlfiddle

Related:

Sign up to request clarification or add additional context in comments.

3 Comments

nice usage of lateral join
Thank you, but I'm on 9.1. I have plans to migrate later this year, but in the meantime, can something similar be achieved for 9.1?
@FrançoisBeausoleil: Added a version for 9.1.
4

If you need better performances and can install Postgres extensions, the agg_for_vecs C extension provides a vec_to_sum function that should meet your need. It also offers various aggregate functions like min, max, avg, and var_samp that operate on arrays instead of scalars.

Comments

4

I know the original question and answer are pretty old, but for others who find this... The most elegant and flexible solution I've found is to create a custom aggregate function. Erwin's answer presents some great simple solutions if you only need the single resulting array, but doesn't translate to a solution that could include other table columns and aggregations, in a GROUP BY for example.

With a custom array_add function and array_sum aggregate function:

CREATE OR REPLACE FUNCTION array_add(_a numeric[], _b numeric[])
  RETURNS numeric[]
AS
$$
BEGIN
  RETURN ARRAY(
    SELECT coalesce(a, 0) + coalesce(b, 0)
    FROM unnest(_a, _b) WITH ORDINALITY AS x(a, b, n)
    ORDER BY n
  );
END
$$ LANGUAGE plpgsql;

CREATE AGGREGATE array_sum(numeric[])
(
  sfunc = array_add,
  stype = numeric[],
  initcond = '{}'
);

Then (using the names from your example):

SELECT array_sum(a) a_sums
FROM regres;

Returns your array of sums, and it can just as well be used anywhere other aggregate functions could be used, so if your table also had a column name you wanted to group by, and another array of numbers, column b:

SELECT name, array_sum(a) a_sums, array_sum(b) b_sums
FROM regres
GROUP BY name;

You won't get quite the performance you'd get out of the built-in sum function and just selecting sum(a[1]), sum(a[2]), sum(a[3]), you'd have to implement the array_add function as a compiled C function to get that. But in cases where you don't have the ability to add custom C functions (like a managed cloud database, e.g. AWS RDS), or you're not aggregating huge numbers of rows, the difference probably won't be noticed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.