2016-09-14 69 views
1

我正在使用PostgreSQL。我有两张表,为了这个问题假设me有多个ID。第一个表Table1涉及的消息发送:合并PostgreSQL中的相似列

me | friends | messages_sent 
---------------------------- 
0  1   10 
0  2    7 
0  3    7   
0  4    6 
1  1    5 
1  2   12 
... 

第二Table2涉及收到的消息:

me | friends | messages_received 
---------------------------- 
0  4   17 
0  2    7 
0  1    9   
0  3    0 
... 

我怎样才能得到一个表像(虽然,朋友的顺序并不重要):

me | friends | messages_total 
    ---------------------------- 
    0  1   19 
    0  2   14 
    0  3    7   
    0  4   23 
    ... 

我非常难过的部分是加入me这两个表,然后添加一个朋友的值给予一个e质量值为me ......想法?

+2

如果您使用PostgreSQL,请勿使用MySQL标记。 – Barmar

回答

1

您可以简单地生成两个表的联合,然后用GROUP BYmefriends组组合添加消息聚合函数计算:

SELECT me, friends, sum(count) AS messages_total 
FROM (
    SELECT me, friends, messages_sent AS count FROM Table1 
    UNION ALL 
    SELECT me, friends, messages_received FROM Table2 
) AS t 
GROUP BY me, friends; 

编辑:我正要编辑我的答案,以添加一个注释,推荐帕特里克的答案是更好,但我决定为了运行一个简单的基准的乐趣。所以,如果我们有如下设置(1个百万行对每个表):

CREATE TABLE table1 (
    me integer not null, 
    friends integer not null, 
    messages_sent integer not null 
); 
CREATE TABLE table2 (
    me integer not null, 
    friends integer not null, 
    messages_received integer not null 
); 
INSERT INTO table1 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2); 
INSERT INTO table2 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2); 
CREATE INDEX ON table1(me, friends); 
CREATE INDEX ON table2(me, friends); 
ANALYZE; 

然后第一个解决方案:

$ EXPLAIN ANALYZE 
     SELECT me, friends, sum(count) AS messages_total 
     FROM (
      SELECT me, friends, messages_sent AS count FROM Table1 
      UNION ALL 
      SELECT me, friends, messages_received FROM Table2 
    ) AS t 
     GROUP BY me, friends; 
                  QUERY PLAN               
------------------------------------------------------------------------------------------------------------------------------ 
HashAggregate (cost=45812.00..46212.00 rows=40000 width=12) (actual time=1201.602..1499.285 rows=1000000 loops=1) 
    Group Key: table1.me, table1.friends 
    -> Append (cost=0.00..30812.00 rows=2000000 width=12) (actual time=0.022..299.260 rows=2000000 loops=1) 
     -> Seq Scan on table1 (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.020..91.357 rows=1000000 loops=1) 
     -> Seq Scan on table2 (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.004..77.672 rows=1000000 loops=1) 
Planning time: 0.255 ms 
Execution time: 1529.642 ms 

而第二个方案:

$ EXPLAIN ANALYZE 
    SELECT me, friends, 
      coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total 
    FROM Table1 
    FULL JOIN Table2 USING (me, friends) 
    ORDER BY me; 
                    QUERY PLAN                   
------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Sort (cost=219582.13..222082.13 rows=1000000 width=24) (actual time=1501.873..1583.915 rows=1000000 loops=1) 
    Sort Key: (COALESCE(table1.me, table2.me)) 
    Sort Method: external sort Disk: 21512kB 
    -> Merge Full Join (cost=0.85..99414.29 rows=1000000 width=24) (actual time=0.074..912.598 rows=1000000 loops=1) 
     Merge Cond: ((table1.me = table2.me) AND (table1.friends = table2.friends)) 
     -> Index Scan using table1_me_friends_idx on table1 (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.039..165.772 rows=1000000 loops=1) 
     -> Index Scan using table2_me_friends_idx on table2 (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.018..194.177 rows=1000000 loops=1) 
Planning time: 1.091 ms 
Execution time: 1615.011 ms 

那么令人惊讶,尽管可以利用索引,但FULL JOIN的解决方案表现稍差。我想这与完全加入有关;对于其他类型的加入,它会好得多。

+1

在SQL中,通常有几种方法可以做某些事情并获得相同的结果;一些很好,很多很糟糕。这个答案是第二类。 – Patrick

1

您应该使用mefriends这两个字段来加入这两个表格,然后简单地合并收到和发送的消息。使用FULL JOIN可确保包括我发送但不接收来自朋友的所有情况,反之亦然。

SELECT me, friends, 
     coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total 
FROM Table1 
FULL JOIN Table2 USING (me, friends) 
ORDER BY me; 
+0

我认为你可以在这种情况下修改messages_sent + messages_received到 COALESCE(messages_sent,0)+ COALESCE(messages_received,0) 如果第一个或第二个查询没有针对特定组合的结果,则结果之和不为空。 –