0

I have a table like this

SELECT id, items
  FROM ( VALUES
    ( '1', ARRAY['A', 'B'] ),
    ( '2', ARRAY['A', 'B', 'C'] ),
    ( '3', ARRAY['E', 'F'] ),
    ( '4', ARRAY['G'] )
  ) AS t(id, items)

Two items belongs to the same group if the have at least one item in common.

For example #1 and #2 belongs to the same group because they both have A and B. #3 and #4 are other different group.

So my desidered output would be

ID items group_alias
1 {A,B} {A,B}
2 {A,B,C} {A,B}
3 {E,F} {E,F}
4 {G} {G}

The group_alias field is a new field that say to me that the record #1 and #2 belongs to the same group.

8
  • What would you do with {B,C} or {C}? Commented Jul 15, 2022 at 15:37
  • If #5 has {B,C}, what would group_alias look like? Commented Jul 15, 2022 at 15:50
  • @xehpuk It's fine to be a string or another array_agg(). The application will used it to group using string comparison. Commented Jul 15, 2022 at 16:50
  • @richyen it would be {B} and the same for #1 and #2. The minimum common item. Commented Jul 15, 2022 at 16:52
  • 1
    I don't understand your answer. If you have {A,B}, {B,C} and {C,A}, there is no "minimum common item". Commented Jul 15, 2022 at 17:20

2 Answers 2

1

Having

CREATE TABLE temp1
(
    id int PRIMARY KEY,
    items char[] NOT NULL
);

INSERT INTO temp1 VALUES
( '1', ARRAY['A', 'B'] ),
    ( '2', ARRAY['A', 'B', 'C'] ),
    ( '3', ARRAY['E', 'F'] ),
    ( '4', ARRAY['G'] );

--Indexing array field to speedup queries   
CREATE INDEX idx_items on temp1 USING GIN ("items");    

Then

select t1.*,
coalesce( (select t2.items  from temp1 t2 
            where t2.items && t1.items 
             and t1.id != t2.id 
             and array_length(t2.items,1)<array_length(t1.items,1) 
             order by array_length(t2.items,1) limit 1 )/*minimum common*/
            , t1.items /*trivial solution*/ )  group_alias
from temp1 t1;

https://www.db-fiddle.com/f/46ydeE5ZXCJDk4Rw3cu4jt/10

Sign up to request clarification or add additional context in comments.

8 Comments

The #2 should be with group_alias ["B"] because #1,#2 and 3# all contains at least "B".
I'll give it a try later. Is it a programming challenge?
No, it's a real case scenario for an application.
Now it works correctly, there was an error in ordering clause.
It works but when trying with larger dataset (100k rows) the query timeouts. Is there an alternative method?
|
0

This query returns all group alias of an item. For example item no. 5 has group alias {E} and {A,B}. The performance is maybe better if you create a temporary table for the items instead of creating them dynamically like you mentioned in one comment. Temporary tables are automatically dropped at the end of a session. You can create indexes on temporary tables, too, which can speed up the query.

CREATE TEMP TABLE temp
(
    id int PRIMARY KEY,
    items char[] NOT NULL
);

INSERT INTO temp VALUES
( '1', ARRAY['A', 'B'] ),
( '2', ARRAY['A', 'B', 'C'] ),
( '3', ARRAY['E', 'F'] ),
( '4', ARRAY['G'] ),
( '5', ARRAY['A', 'B', 'E'] );

The query:

SELECT DISTINCT
    t1.id, t1.items, coalesce(match, t1.items) AS group_alias
FROM temp t1 LEFT JOIN (
    SELECT
        t2.id, match
    FROM
        temp t2,
        LATERAL(
            SELECT
                match
            FROM
                temp t3,
                LATERAL(
                    SELECT 
                        array_agg(aa) AS match
                    FROM 
                        unnest(t2.items) aa 
                    JOIN 
                        unnest(t3.items) ab 
                    ON  aa = ab
                ) AS m1
            WHERE 
                t2.id != t3.id AND t2.items && t3.items 
        ) AS m2
    ) AS groups 
ON groups.id = t1.id 
ORDER BY t1.id;

And the result:

 id |  items  | group_alias
----+---------+-------------
  1 | {A,B}   | {A,B}
  2 | {A,B,C} | {A,B}
  3 | {E,F}   | {E}
  4 | {G}     | {G}
  5 | {A,B,E} | {A,B}
  5 | {A,B,E} | {E}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.