How to group by array?

Question

I have got dataset below:

id | object |
------------|
id1| object1|
id1| object2|
id1| object3|
id2| object1|
id2| object3|
id2| object4|
id3| object2|
id3| object3|
id3| object4|
id4| object1|
id4| object2|
id4| object3|
id5| object1| 
id5| object2|
id6| object1|
id6| object2|

I need to group by array of duplicate data using column 'object' like this:

object | count()
----------------
object1|   2   |<-from id1 and id4
object2|   2   |
object3|   2   |
----------------
object1|   2   |<-from id5 and id6
object2|   2   |
----------------
object1|   1   |<-from id2
object3|   1   |
object4|   1   |
----------------
object2|   1   |<-from id3
object3|   1   | 
object4|   1   |

How can I group my data by coincidental arrays?

How do you decide which IDs you want to bucket together? If it's really coincidental, you'll have to build a case statement to put the ones you want together. — Andrew
– Andrew, Commented Dec 20, 2018 at 19:16

Sentinel · Accepted Answer · 2018-12-20 19:57:14Z

It looks like you want to identify groups of IDs based on their set of common objects. In your first group IDs 1 and 4 are associated with the same three objects 1, 2, and 3.

To do this the first step is to uniquely identify each group. In postgresql, the array_agg analytic (window) function can be used to do this. Once the groups are identified, then you can count the relevant IDs as shown below and in this SQL Fiddle:

Query 1:

with grp as (
  select id
       , object
       , array_agg(object) 
         over (partition by id order by object
               rows between unbounded preceding
                        and unbounded following) objs
   from YourData
)
select min(id) first_id
     , object
     , count(id) cnt
  from grp
 group by objs, object
order by cnt desc, first_id, object

Results:

| first_id |  object | cnt |
|----------|---------|-----|
|      id1 | object1 |   2 |
|      id1 | object2 |   2 |
|      id1 | object3 |   2 |
|      id5 | object1 |   2 |
|      id5 | object2 |   2 |
|      id2 | object1 |   1 |
|      id2 | object3 |   1 |
|      id2 | object4 |   1 |
|      id3 | object2 |   1 |
|      id3 | object3 |   1 |
|      id3 | object4 |   1 |

Gordon Linoff · Accepted Answer · 2018-12-20 19:24:46Z

1

If I understand correctly, you want something like this:

select objects, array_agg(id) as ids, count(*) as num_ids
from (select id, array_agg(object order by id) as objects
      from t
      group by id
     ) i
group by objects;

answered Dec 20, 2018 at 19:24

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Collectives™ on Stack Overflow

How to group by array?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related