0

I have got dataset below:

id | object |
------------|
id1| object1|
id1| object2|
id1| object3|
id2| object1|
id2| object3|
id2| object4|
id3| object2|
id3| object3|
id3| object4|
id4| object1|
id4| object2|
id4| object3|
id5| object1| 
id5| object2|
id6| object1|
id6| object2|

I need to group by array of duplicate data using column 'object' like this:

object | count()
----------------
object1|   2   |<-from id1 and id4
object2|   2   |
object3|   2   |
----------------
object1|   2   |<-from id5 and id6
object2|   2   |
----------------
object1|   1   |<-from id2
object3|   1   |
object4|   1   |
----------------
object2|   1   |<-from id3
object3|   1   | 
object4|   1   |

How can I group my data by coincidental arrays?

1
  • 2
    How do you decide which IDs you want to bucket together? If it's really coincidental, you'll have to build a case statement to put the ones you want together. Commented Dec 20, 2018 at 19:16

2 Answers 2

2

It looks like you want to identify groups of IDs based on their set of common objects. In your first group IDs 1 and 4 are associated with the same three objects 1, 2, and 3.

To do this the first step is to uniquely identify each group. In postgresql, the array_agg analytic (window) function can be used to do this. Once the groups are identified, then you can count the relevant IDs as shown below and in this SQL Fiddle:

Query 1:

with grp as (
  select id
       , object
       , array_agg(object) 
         over (partition by id order by object
               rows between unbounded preceding
                        and unbounded following) objs
   from YourData
)
select min(id) first_id
     , object
     , count(id) cnt
  from grp
 group by objs, object
order by cnt desc, first_id, object

Results:

| first_id |  object | cnt |
|----------|---------|-----|
|      id1 | object1 |   2 |
|      id1 | object2 |   2 |
|      id1 | object3 |   2 |
|      id5 | object1 |   2 |
|      id5 | object2 |   2 |
|      id2 | object1 |   1 |
|      id2 | object3 |   1 |
|      id2 | object4 |   1 |
|      id3 | object2 |   1 |
|      id3 | object3 |   1 |
|      id3 | object4 |   1 |
Sign up to request clarification or add additional context in comments.

Comments

1

If I understand correctly, you want something like this:

select objects, array_agg(id) as ids, count(*) as num_ids
from (select id, array_agg(object order by id) as objects
      from t
      group by id
     ) i
group by objects;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.