57

I have a column which is of type integer array. How can I merge all of them into a single integer array?

For example: If I execute query:

select column_name from table_name

I get result set as:

-[RECORD 1]----------
column_name | {1,2,3}
-[RECORD 2]----------
column_name | {4,5}

How can I get {1,2,3,4,5} as final result?

7 Answers 7

76

You could use unnest to open up the arrays and then array_agg to put them back together:

select array_agg(c)
from (
  select unnest(column_name)
  from table_name
) as dt(c);
Sign up to request clarification or add additional context in comments.

5 Comments

Likely much more efficient than mine, but won't necessarily retain element order; you'd have to use with ordinality for that.
@Craig: Which version of PostgreSQL has WITH ORDINALITY? Anyway, custom aggregates are kinda' cool
Added in PostgreSQL 9.4, so "coming soon". I'm too used to working with git master...
Also works well to get the unique elements of all the arrays, by simply adding a distinct before the unnest.
25

Define a trivial custom aggregate:

CREATE AGGREGATE array_cat_agg(anyarray) (
  SFUNC=array_cat,
  STYPE=anyarray
);

and use it:

WITH v(a) AS ( VALUES (ARRAY[1,2,3]), (ARRAY[4,5,6,7]))
SELECT array_cat_agg(a) FROM v;

If you want a particular order, put it within the aggregate call, i.e. array_cat_agg(a ORDER BY ...)

This is roughly O(n log n) for n rows (I think) O(n²) so it is unsuitable for long sets of rows. For better performance you'd need to write it in C, where you can use the more efficient (but horrible to use) C API for PostgreSQL arrays to avoid re-copying the array each iteration.

6 Comments

FWIW this aggregate runs in quadratic time O(n^2) and so isn't suitable for large datasets. source: I used it on some large datasets in production and had to rip it out =)
@JohnBledsoe I'm surprised it's O(n^2), are you sure? It copies the whole array once per iteration, including all prior members, when it constructs a new one. Either way, it'll still be plenty slow for long inputs.
I've been out of CS school for a long time, so I'm not sure =) but yeah copying an N-length array N times is O(n^2) IIRC.
@JohnBledsoe The array starts at length 1. Each time you copy it, it grows by 1 element. Assuming each input array is of the same length (treated as 1 element for this purpose): 1 elements copied @ n=1 . 3 elements copied @ n=2 . 6 elements copied @ n=3. 10 elements copied @ n=4. It's a series sum n∑n . Which is (n·n)/2 or n²/2 .. so O(n^2). You're quite right. Pg doesn't have mutable arrays at the SQL level so you'd need to use a PL (say, Python with numpy or intarray) or use C to do it more efficiently.
Not sure about the mathematics here, but from my experience it was very slow too. Took forever (I gave up after 30 seconds) on a 300K rows table, while mu-is-too-short's solution took 215ms.
|
4

to merge arrays you can use || operator more

to put result in flat list use uunest func

example:

select unnest(ARRAY[1,2] || ARRAY[3,2] || ARRAY[4,5]) as number;

result

Comments

3
string_to_array(string_agg(array_to_string(column_name ,','),','),',')

This can be some clue for your situation. I've done like this.

1 Comment

Please provide additional details in your answer. As it's currently written, it's hard to understand your solution.
2

You could use lateral subquery for that:

select array_agg(u.a)
from (values (array[1, 2, 3]), (array[4, 5])) t (a)
    join lateral unnest(t.a) u (a) on true;

Comments

0

To enrich Craig Ringer's answer, his CREATE AGGREGATE solution is good but reportedly slow. That's because it only has an SFUNC defined, which means it can only aggregate a single item into the array, one at a time.

This can be improved by defining the aggregate as a partial aggregate. In this mode, it can collect sub-arrays -- possibly in parallel -- then merge those arrays into larger arrays. This is done by adding a COMBINEFUNC definition, which in this case is also just array_cat. I also mark the function as parallel safe.

CREATE AGGREGATE public.array_agg_flat(anycompatiblearray) (
    SFUNC = array_cat,
    COMBINEFUNC = array_cat,
    STYPE = anycompatiblearray,
    PARALLEL = SAFE
);

This should drastically reduce the number of array (re)allocations occurring behind the scenes.

Comments

-2

The only way you can do this is inside a function:

CREATE FUNCTION merge_arrays() RETURNS int[] AS $$
DECLARE
  this record;
  res  int[];
BEGIN
  FOR this IN
    SELECT column_name FROM table_name
  LOOP
    array_cat(res, this.column_name);
  END LOOP;
  RETURN res;
END; $$ LANGUAGE plpgsql;

Then you can

SELECT merge_arrays();

to get the result you are looking for.

This of course hard-codes your table definition into the function, which may (or may not) be an issue. In addition, you may want to put a WHERE clause in the loop query to restrict the records whose arrays you want to append; you might use an additional function parameter to do this.

Keep in mind that you might get a really large array as your table increases in size and that may affect performance. Do you really need all sub-arrays from all records in one large array? Have a look at your application and see if you can do the merge at that level, rather than in a single query.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.