Merging similar columns in PostgreSQL

Question

I'm working with PostgreSQL. I have two tables, assume for the sake of this problem me has multiple IDs. The first table Table1 deals with messages sent:

me | friends | messages_sent
----------------------------
0      1            10
0      2             7 
0      3             7          
0      4             6
1      1             5
1      2            12
...

The second Table2 deals with messages received:

me | friends | messages_received
----------------------------
 0      4            17
 0      2             7 
 0      1             9          
 0      3             0
 ...

How I can get a table like (though, order of friends not important):

    me | friends | messages_total
    ----------------------------
    0      1            19
    0      2            14 
    0      3             7          
    0      4            23
    ...

The part I'm pretty stumped on is joining both tables on me while then adding the values of a friend given an equal value for me ... thoughts?

Don't use the MySQL tag if you're working with PostgreSQL.

Barmar
– Barmar

2016-09-14 23:59:56 +00:00
Commented Sep 14, 2016 at 23:59 — Barmar
– Barmar, Commented Sep 14, 2016 at 23:59

Patrick · Accepted Answer · 2016-09-15 07:59:12Z

1

You should join the two tables using both fields me and friends and then simply add up the messages received and sent. Using a FULL JOIN ensures that all situations, such as me sending but not receiving from a friend and vice-versa, are included.

SELECT me, friends,
       coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
FROM Table1
FULL JOIN Table2 USING (me, friends)
ORDER BY me;

edited Sep 15, 2016 at 7:59

answered Sep 15, 2016 at 3:06

Patrick

33k7 gold badges73 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Taleh Ibrahimli Over a year ago

I think you can modify messages_sent + messages_received to COALESCE( messages_sent, 0) + COALESCE( messages_received, 0) in this case if first or second query has no result for specific combination, the sum of result will not be null.

redneb · Accepted Answer · 2016-09-15 09:15:34Z

You can simply generate the union of the two tables and then use a GROUP BY to group combinations of me and friends adding the message counts with an aggregate function:

SELECT me, friends, sum(count) AS messages_total
FROM (
    SELECT me, friends, messages_sent AS count FROM Table1
    UNION ALL
    SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;

Edit: I was about to edit my answer in order to add a note recommending Patrick's answer as being better, but I decided for fun to run a simple benchmark. So if we have the following setup (1 million rows for each table):

CREATE TABLE table1 (
    me integer not null,
    friends integer not null,
    messages_sent integer not null
);
CREATE TABLE table2 (
    me integer not null,
    friends integer not null,
    messages_received integer not null
);
INSERT INTO table1 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
INSERT INTO table2 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
CREATE INDEX ON table1(me, friends);
CREATE INDEX ON table2(me, friends);
ANALYZE;

Then the first solution:

$ EXPLAIN ANALYZE
      SELECT me, friends, sum(count) AS messages_total
      FROM (
          SELECT me, friends, messages_sent AS count FROM Table1
          UNION ALL
          SELECT me, friends, messages_received FROM Table2
      ) AS t
      GROUP BY me, friends;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=45812.00..46212.00 rows=40000 width=12) (actual time=1201.602..1499.285 rows=1000000 loops=1)
   Group Key: table1.me, table1.friends
   ->  Append  (cost=0.00..30812.00 rows=2000000 width=12) (actual time=0.022..299.260 rows=2000000 loops=1)
         ->  Seq Scan on table1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.020..91.357 rows=1000000 loops=1)
         ->  Seq Scan on table2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.004..77.672 rows=1000000 loops=1)
 Planning time: 0.255 ms
 Execution time: 1529.642 ms

And the second solution:

$ EXPLAIN ANALYZE
    SELECT me, friends,
           coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
    FROM Table1
    FULL JOIN Table2 USING (me, friends)
    ORDER BY me;
                                                                     QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=219582.13..222082.13 rows=1000000 width=24) (actual time=1501.873..1583.915 rows=1000000 loops=1)
   Sort Key: (COALESCE(table1.me, table2.me))
   Sort Method: external sort  Disk: 21512kB
   ->  Merge Full Join  (cost=0.85..99414.29 rows=1000000 width=24) (actual time=0.074..912.598 rows=1000000 loops=1)
         Merge Cond: ((table1.me = table2.me) AND (table1.friends = table2.friends))
         ->  Index Scan using table1_me_friends_idx on table1  (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.039..165.772 rows=1000000 loops=1)
         ->  Index Scan using table2_me_friends_idx on table2  (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.018..194.177 rows=1000000 loops=1)
 Planning time: 1.091 ms
 Execution time: 1615.011 ms

So suprisingly, the solution with the FULL JOIN performs slightly worse, even though it can utilize the index. I guess this has to do with the full join; for other types of join it would have been much better.

In SQL there are usually several ways to do something and achieve identical results; some good, many bad. This answer is one of the second category.

Collectives™ on Stack Overflow

Merging similar columns in PostgreSQL

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related