您可以简单地生成两个表的联合,然后用GROUP BY
到me
和friends
组组合添加消息聚合函数计算:
SELECT me, friends, sum(count) AS messages_total
FROM (
SELECT me, friends, messages_sent AS count FROM Table1
UNION ALL
SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;
编辑:我正要编辑我的答案,以添加一个注释,推荐帕特里克的答案是更好,但我决定为了运行一个简单的基准的乐趣。所以,如果我们有如下设置(1个百万行对每个表):
CREATE TABLE table1 (
me integer not null,
friends integer not null,
messages_sent integer not null
);
CREATE TABLE table2 (
me integer not null,
friends integer not null,
messages_received integer not null
);
INSERT INTO table1 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
INSERT INTO table2 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
CREATE INDEX ON table1(me, friends);
CREATE INDEX ON table2(me, friends);
ANALYZE;
然后第一个解决方案:
$ EXPLAIN ANALYZE
SELECT me, friends, sum(count) AS messages_total
FROM (
SELECT me, friends, messages_sent AS count FROM Table1
UNION ALL
SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=45812.00..46212.00 rows=40000 width=12) (actual time=1201.602..1499.285 rows=1000000 loops=1)
Group Key: table1.me, table1.friends
-> Append (cost=0.00..30812.00 rows=2000000 width=12) (actual time=0.022..299.260 rows=2000000 loops=1)
-> Seq Scan on table1 (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.020..91.357 rows=1000000 loops=1)
-> Seq Scan on table2 (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.004..77.672 rows=1000000 loops=1)
Planning time: 0.255 ms
Execution time: 1529.642 ms
而第二个方案:
$ EXPLAIN ANALYZE
SELECT me, friends,
coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
FROM Table1
FULL JOIN Table2 USING (me, friends)
ORDER BY me;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=219582.13..222082.13 rows=1000000 width=24) (actual time=1501.873..1583.915 rows=1000000 loops=1)
Sort Key: (COALESCE(table1.me, table2.me))
Sort Method: external sort Disk: 21512kB
-> Merge Full Join (cost=0.85..99414.29 rows=1000000 width=24) (actual time=0.074..912.598 rows=1000000 loops=1)
Merge Cond: ((table1.me = table2.me) AND (table1.friends = table2.friends))
-> Index Scan using table1_me_friends_idx on table1 (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.039..165.772 rows=1000000 loops=1)
-> Index Scan using table2_me_friends_idx on table2 (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.018..194.177 rows=1000000 loops=1)
Planning time: 1.091 ms
Execution time: 1615.011 ms
那么令人惊讶,尽管可以利用索引,但FULL JOIN
的解决方案表现稍差。我想这与完全加入有关;对于其他类型的加入,它会好得多。
如果您使用PostgreSQL,请勿使用MySQL标记。 – Barmar