1

I'm working with PostgreSQL. I have two tables, assume for the sake of this problem me has multiple IDs. The first table Table1 deals with messages sent:

me | friends | messages_sent
----------------------------
0      1            10
0      2             7 
0      3             7          
0      4             6
1      1             5
1      2            12
...

The second Table2 deals with messages received:

me | friends | messages_received
----------------------------
 0      4            17
 0      2             7 
 0      1             9          
 0      3             0
 ...

How I can get a table like (though, order of friends not important):

    me | friends | messages_total
    ----------------------------
    0      1            19
    0      2            14 
    0      3             7          
    0      4            23
    ...

The part I'm pretty stumped on is joining both tables on me while then adding the values of a friend given an equal value for me ... thoughts?

1
  • 2
    Don't use the MySQL tag if you're working with PostgreSQL. Commented Sep 14, 2016 at 23:59

2 Answers 2

1

You should join the two tables using both fields me and friends and then simply add up the messages received and sent. Using a FULL JOIN ensures that all situations, such as me sending but not receiving from a friend and vice-versa, are included.

SELECT me, friends,
       coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
FROM Table1
FULL JOIN Table2 USING (me, friends)
ORDER BY me;
Sign up to request clarification or add additional context in comments.

1 Comment

I think you can modify messages_sent + messages_received to COALESCE( messages_sent, 0) + COALESCE( messages_received, 0) in this case if first or second query has no result for specific combination, the sum of result will not be null.
1

You can simply generate the union of the two tables and then use a GROUP BY to group combinations of me and friends adding the message counts with an aggregate function:

SELECT me, friends, sum(count) AS messages_total
FROM (
    SELECT me, friends, messages_sent AS count FROM Table1
    UNION ALL
    SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;

Edit: I was about to edit my answer in order to add a note recommending Patrick's answer as being better, but I decided for fun to run a simple benchmark. So if we have the following setup (1 million rows for each table):

CREATE TABLE table1 (
    me integer not null,
    friends integer not null,
    messages_sent integer not null
);
CREATE TABLE table2 (
    me integer not null,
    friends integer not null,
    messages_received integer not null
);
INSERT INTO table1 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
INSERT INTO table2 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
CREATE INDEX ON table1(me, friends);
CREATE INDEX ON table2(me, friends);
ANALYZE;

Then the first solution:

$ EXPLAIN ANALYZE
      SELECT me, friends, sum(count) AS messages_total
      FROM (
          SELECT me, friends, messages_sent AS count FROM Table1
          UNION ALL
          SELECT me, friends, messages_received FROM Table2
      ) AS t
      GROUP BY me, friends;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=45812.00..46212.00 rows=40000 width=12) (actual time=1201.602..1499.285 rows=1000000 loops=1)
   Group Key: table1.me, table1.friends
   ->  Append  (cost=0.00..30812.00 rows=2000000 width=12) (actual time=0.022..299.260 rows=2000000 loops=1)
         ->  Seq Scan on table1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.020..91.357 rows=1000000 loops=1)
         ->  Seq Scan on table2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.004..77.672 rows=1000000 loops=1)
 Planning time: 0.255 ms
 Execution time: 1529.642 ms

And the second solution:

$ EXPLAIN ANALYZE
    SELECT me, friends,
           coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
    FROM Table1
    FULL JOIN Table2 USING (me, friends)
    ORDER BY me;
                                                                     QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=219582.13..222082.13 rows=1000000 width=24) (actual time=1501.873..1583.915 rows=1000000 loops=1)
   Sort Key: (COALESCE(table1.me, table2.me))
   Sort Method: external sort  Disk: 21512kB
   ->  Merge Full Join  (cost=0.85..99414.29 rows=1000000 width=24) (actual time=0.074..912.598 rows=1000000 loops=1)
         Merge Cond: ((table1.me = table2.me) AND (table1.friends = table2.friends))
         ->  Index Scan using table1_me_friends_idx on table1  (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.039..165.772 rows=1000000 loops=1)
         ->  Index Scan using table2_me_friends_idx on table2  (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.018..194.177 rows=1000000 loops=1)
 Planning time: 1.091 ms
 Execution time: 1615.011 ms

So suprisingly, the solution with the FULL JOIN performs slightly worse, even though it can utilize the index. I guess this has to do with the full join; for other types of join it would have been much better.

1 Comment

In SQL there are usually several ways to do something and achieve identical results; some good, many bad. This answer is one of the second category.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.