Postgresql array unique aggregation

Question

I have a large table with structure

CREATE TABLE t (
  id SERIAL primary key ,
  a_list int[] not null,
  b_list int[] not null,
  c_list int[] not null,
  d_list int[] not null,
  type int not null 
)

I want query all unique values from a_list, b_list, c_list, d_list for type like this

    select 
        some_array_unique_agg_function(a_list), 
        some_array_unique_agg_function(b_list), 
        some_array_unique_agg_function(c_list), 
        some_array_unique_agg_function(d_list),
        count(1) 
    where type = 30

For example, for this data

+----+---------+--------+--------+---------+------+
| id | a_list  | b_list | c_list | d_list  | type |
+----+---------+--------+--------+---------+------+  
| 1  | {1,3,4} | {2,4}  | {1,1}  | {2,4,5} | 30   |
| 1  | {1,2,4} | {2,4}  | {4,1}  | {2,4,5} | 30   |
| 1  | {1,3,5} | {2}    | {}     | {2,4,5} | 30   |
+----+---------+--------+--------+---------+------+

I want the next result

+-------------+--------+--------+-----------+-------+
| a_list      | b_list | c_list | d_list    | count |
+-------------+--------+--------+-----------+-------+  
| {1,2,3,4,5} | {2,4}  | {1,4}  | {2,4,5}   | 3     |
+-------------+--------+--------+-----------+-------+

Is there some_array_unique_agg_function for my purposes?

@a_horse_with_no_name I need some aggregate function like uniq to merge all values from rows — Yevhen Bondar
– Yevhen Bondar, Commented Aug 5, 2019 at 12:57
@a_horse_with_no_name array_agg(distinct ...) works for scalar values, but my columns have type int[] — Yevhen Bondar
– Yevhen Bondar, Commented Aug 5, 2019 at 13:02
Obviously you need to unnest those values before you can aggregate them with distinct. To be honest: if you need something like that you should probably think about normalizing your model. — user330315
– user330315, Commented Aug 5, 2019 at 13:04

404 · Accepted Answer · 2019-08-05 13:13:13Z

5

Try this

with cte as (select 
        unnest( a_list::text[] )::integer as a_list, 
        unnest( b_list::text[] )::integer as b_list, 
        unnest( c_list::text[] )::integer as c_list, 
        unnest( d_list::text[] )::integer as d_list,
        (select count(type) from t) as type
    from t 
    where type = 30
)
select array_agg(distinct a_list),array_agg(distinct b_list)
,array_agg(distinct c_list),array_agg(distinct d_list),type from cte group by type ;

Result:

"{1,2,3,4,5}";"{2,4,NULL}";"{1,4,NULL}";"{2,4,5}";3

edited Aug 5, 2019 at 13:13

404

8,7522 gold badges33 silver badges53 bronze badges

answered Aug 5, 2019 at 12:53

Ajay

7744 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Yevhen Bondar Over a year ago

I get syntax error ERROR: column "a_list" does not exist LINE 2: unnest( a_list::text[] )::integer as a_list,

Yevhen Bondar Over a year ago

Thank you, works great. One more question, why I need this cast ::text[] )::integer?

Ajay Over a year ago

Actually I thought the data type is string. There is not need as already you are using integer array Datatype..You can try with unnest( a_list::integer[] )as a_list It works fine..:)

Joe · Accepted Answer · 2024-01-06 14:07:21Z

Try this take on old answer from this post: All Permutations of an Array

this lists unique ordered permutations of an INT array grouped by something (row_id and instrument_id in my case) on quite decent time for arrays with length <= 10 :

You may need to install the intarray extension ...

WITH RECURSIVE data AS (
                           SELECT a1.instrument_id, ARRAY_AGG(a1.obs_pos ORDER BY a1.obs_pos) AS arr
                           FROM tmp_xxxx a1
                           GROUP BY 1
                       )
   , keys           AS (
                           SELECT instrument_id, GENERATE_SUBSCRIPTS(d.arr, 1) AS rn
                           FROM data d
                       )
   , cte            AS (
                           SELECT DISTINCT x.instrument_id
                                         , public.sort(x.initial_arr) AS initial_arr
                                         , public.sort(x.new_arr)     AS new_arr
                                         , public.sort(x.used_rn)     AS used_rn
                           FROM (
                                    SELECT d.instrument_id
                                         , d.arr               initial_arr
                                         , ARRAY [d.arr[k.rn]] new_arr
                                         , ARRAY [k.rn]        used_rn
                                    FROM data d
                                    JOIN keys k
                                         ON d.instrument_id = k.instrument_id
                                ) x
                           UNION ALL
                           SELECT DISTINCT c.instrument_id
                                         , public.sort(initial_arr)                      AS initial_arr
                                         , public.sort(c.new_arr || c.initial_arr[k.rn]) AS new_arr
                                         , public.sort(used_rn || k.rn)                  AS used_rn
                           FROM cte  c
                           JOIN keys k
                                ON c.instrument_id = k.instrument_id AND NOT (k.rn = ANY (c.used_rn))
                       )
INSERT
INTO tmp_xxxx( row_id
                      , instrument_id
                      , obs_pos_array
                      )
SELECT DISTINCT _row_id              AS row_id
              , cte.instrument_id
              , public.sort(new_arr) AS obs_pos_array
FROM cte
WHERE ARRAY_LENGTH(new_arr, 1) >= 2 -- change it to your needs
ON CONFLICT ON CONSTRAINT pk_xxxx DO NOTHING;

Collectives™ on Stack Overflow

Postgresql array unique aggregation

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related