2016-02-04 84 views
2

我已经非常期待新的PostgreSQL 9.5功能,并且很快就会升级我们的数据库。不过,我很惊讶,当我发现,在我们的数据为什么PostgreSQL 9.5的CUBE,ROLLUP和GROUPING SETS比等效的UNION慢?

SELECT col1, col2, count(*), grouping(col1,col2) 
FROM table1 
GROUP BY CUBE(col1, col2) 

查询实际运行慢得多(约3秒),比相当于数据查询的持续时间的总和(〜1秒总的所有4个查询,100-300ms每)。 col1和col2都有索引。

这是预期的吗(意思是功能更多地是关于兼容性而不是性能)?或者可以以某种方式进行调整?

这里有一个真空生产表的例子:

> explain analyze select service_name, state, res_id, count(*) from bookings group by rollup(service_name, state, res_id); 
                  QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------- 
GroupAggregate (cost=43069.12..45216.05 rows=4161 width=24) (actual time=1027.341..1120.675 rows=428 loops=1) 
    Group Key: service_name, state, res_id 
    Group Key: service_name, state 
    Group Key: service_name 
    Group Key:() 
    -> Sort (cost=43069.12..43490.18 rows=168426 width=24) (actual time=1027.301..1070.321 rows=168426 loops=1) 
     Sort Key: service_name, state, res_id 
     Sort Method: external merge Disk: 5728kB 
     -> Seq Scan on bookings (cost=0.00..28448.26 rows=168426 width=24) (actual time=0.079..147.619 rows=168426 loops=1) 
Planning time: 0.118 ms 
Execution time: 1122.557 ms 
(11 rows) 

> explain analyze select service_name, state, res_id, count(*) from bookings group by service_name, state, res_id 
UNION ALL select service_name, state, NULL, count(*) from bookings group by service_name, state 
UNION ALL select service_name, NULL, NULL, count(*) from bookings group by service_name 
UNION ALL select NULL, NULL, NULL, count(*) from bookings; 
                   QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------------------------- 
Append (cost=30132.52..118086.91 rows=4161 width=32) (actual time=208.986..706.347 rows=428 loops=1) 
    -> HashAggregate (cost=30132.52..30172.12 rows=3960 width=24) (actual time=208.986..209.078 rows=305 loops=1) 
     Group Key: bookings.service_name, bookings.state, bookings.res_id 
     -> Seq Scan on bookings (cost=0.00..28448.26 rows=168426 width=24) (actual time=0.022..97.637 rows=168426 loops=1) 
    -> HashAggregate (cost=29711.45..29713.25 rows=180 width=20) (actual time=195.851..195.879 rows=96 loops=1) 
     Group Key: bookings_1.service_name, bookings_1.state 
     -> Seq Scan on bookings bookings_1 (cost=0.00..28448.26 rows=168426 width=20) (actual time=0.029..95.588 rows=168426 loops=1) 
    -> HashAggregate (cost=29290.39..29290.59 rows=20 width=11) (actual time=181.955..181.960 rows=26 loops=1) 
     Group Key: bookings_2.service_name 
     -> Seq Scan on bookings bookings_2 (cost=0.00..28448.26 rows=168426 width=11) (actual time=0.030..97.047 rows=168426 loops=1) 
    -> Aggregate (cost=28869.32..28869.33 rows=1 width=0) (actual time=119.332..119.332 rows=1 loops=1) 
     -> Seq Scan on bookings bookings_3 (cost=0.00..28448.26 rows=168426 width=0) (actual time=0.039..93.508 rows=168426 loops=1) 
Planning time: 0.373 ms 
Execution time: 706.558 ms 
(14 rows) 

总时间是不相上下,但后者采用四次扫描,应该不是很慢? “在磁盘上的外部合并”,而使用rollup()很奇怪,我有work_mem设置为16M。

+4

向我们展示使用'explain(analyze,verbose)' –

+0

添加示例的执行计划。同一列上的CUBE()会带来更大的差异 – codesnik

+1

排序(外部合并排序)需要大部分时间,对吗? 1027+毫秒,还是我误解了? –

回答

1

有趣的,但在这个特殊的例子SET work_mem='32mb'摆脱磁盘合并,现在使用ROLLUP比对应的联盟快2倍。

解释分析现在包含:“排序方法:快速排序内存:19301kB”

我仍然不知道为什么需要区区400行输出的,和这么多的内存,为什么需要7MB磁盘合并相比,内存19MB(快速排序开销?),但我的问题解决了。

+0

排序正在168k行上工作,不是吗? –

+0

是的,你是对的。这就是整个桌子!这是否意味着ROLLUP/CUBE/GROUPING SETS只能以这种(或多或少)天真的方式工作,或者在有意义的情况下会出现极端情况? – codesnik

0

似乎分组集总是有GroupAggregate和Sort查询计划。 但按频率标准组使用HashAggragragate。