2017-02-21 69 views
2

我试图让条目的总计数,但不幸的是我不相信汇总会是最好的选择:SQL ROLLUP或Union?

SELECT BUSINESS_STATUS_NAME, 
    PENDING_ITEMS, 
    DATAGROUP 
FROM PAYMENTS 
WHERE STATUS LIKE '%PROCESS%'; 

这将产生:

BUSINESS_STATUS_NAME  PENDING_ITEMS  DATAGROUP 
PROCESSING DATA   34    PRODUCT 
PROCESSING INS   40    SERVICE 

我想获得一个大总计低于,但ROLLUP给我的小计,因为它包括数据组列。我只需要悬而未决的项目总计,但我需要显示数据组。 UNION有总和(pending_items)选择查询会更好吗?

BUSINESS_STATUS_NAME  PENDING_ITEMS  DATAGROUP 
PROCESSING DATA   34    PRODUCT 
PROCESSING INS   40    SERVICE 
GRAND TOTAL **   74 

谢谢!

+2

使用ROLLUP可获得更好的性能。如果需要通过group by子句使用小计。请参阅http://sql-plsql.blogspot.in/2010/10/rollup。html –

回答

0

您可以使用rollup,但你需要一个聚集查询:

SELECT BUSINESS_STATUS_NAME, 
     SUM(PENDING_ITEMS) as PENDING_ITEMS, 
     DATAGROUP 
FROM PAYMENTS 
WHERE STATUS LIKE '%PROCESS%' 
GROUP BY ROLLUP (BUSINESS_STATUS_NAME, DATAGROUP); 

我怀疑有这样的一个union all之间的性能差异。但是,请注意,这可保证将汇总行作为结果集中的最后一行。

+0

我相信你还需要另外几个括号 – Aleksej

+0

union all可能需要两次读取基表 - 为什么没有性能差异和只读一次(使用'rollup'解决方案)? – mathguy

3

我会使用ROLLUP,为了清晰和性能。

说你有一个这样的示例表:

create table payments (business_status_name, pending_items, datagroup) as (
    select 'PROCESSING DATA', 10, 'PRODUCT' from dual union all 
    select 'PROCESSING DATA', 5, 'PRODUCT' from dual union all 
    select 'PROCESSING DATA', 2, 'SERVICE' from dual union all 
    select 'PROCESSING INS', 10, 'SERVICE' from dual union all 
    select 'PROCESSING INS', 10, 'SERVICE' from dual union all 
    select 'PROCESSING INS', 10, 'PRODUCT' from dual 
) 

这是ROLLUP的方式(注意括号来改变分组的逻辑):

SELECT BUSINESS_STATUS_NAME, 
     SUM(PENDING_ITEMS) as PENDING_ITEMS, 
     DATAGROUP 
FROM PAYMENTS 
GROUP BY ROLLUP ((BUSINESS_STATUS_NAME, DATAGROUP)) 

结果:

BUSINESS_STATUS PENDING_ITEMS DATAGRO 
--------------- ------------- ------- 
PROCESSING INS    10 PRODUCT 
PROCESSING INS    20 SERVICE 
PROCESSING DATA   15 PRODUCT 
PROCESSING DATA    2 SERVICE 
          47 

该计划:

--------------------------------------------------------------------------------- 
| Id | Operation   | Name  | Rows | Bytes | Cost (%CPU)| Time  | 
--------------------------------------------------------------------------------- 
| 0 | SELECT STATEMENT  |   |  6 | 186 |  4 (25)| 00:00:01 | 
| 1 | SORT GROUP BY ROLLUP|   |  6 | 186 |  4 (25)| 00:00:01 | 
| 2 | TABLE ACCESS FULL | PAYMENTS |  6 | 186 |  3 (0)| 00:00:01 | 
--------------------------------------------------------------------------------- 

这是UNION ALL

SELECT BUSINESS_STATUS_NAME, 
     SUM(PENDING_ITEMS) as PENDING_ITEMS, 
     DATAGROUP 
FROM PAYMENTS 
GROUP BY BUSINESS_STATUS_NAME, DATAGROUP 
UNION ALL 
SELECT NULL, SUM(PENDING_ITEMS), NULL 
FROM PAYMENTS; 

结果比ROLLUP相同:

BUSINESS_STATUS PENDING_ITEMS DATAGRO 
--------------- ------------- ------- 
PROCESSING INS    20 SERVICE 
PROCESSING INS    10 PRODUCT 
PROCESSING DATA   15 PRODUCT 
PROCESSING DATA    2 SERVICE 
          47 

的计划也不是那么好,TWO FULL SCANS

-------------------------------------------------------------------------------- 
| Id | Operation   | Name  | Rows | Bytes | Cost (%CPU)| Time  | 
-------------------------------------------------------------------------------- 
| 0 | SELECT STATEMENT |   |  7 | 199 |  7 (58)| 00:00:01 | 
| 1 | UNION-ALL   |   |  |  |   |   | 
| 2 | HASH GROUP BY  |   |  6 | 186 |  4 (25)| 00:00:01 | 
| 3 | TABLE ACCESS FULL| PAYMENTS |  6 | 186 |  3 (0)| 00:00:01 | 
| 4 | SORT AGGREGATE |   |  1 | 13 |   |   | 
| 5 | TABLE ACCESS FULL| PAYMENTS |  6 | 78 |  3 (0)| 00:00:01 | 
-------------------------------------------------------------------------------- 

这当然只有一个有少量记录的小示例,没有索引,......因此,真实表格上的内容可能会有所不同,但我仍然认为ROLLUP应该比UNION ALL更好。

在一个简单的情况下,完全等于你的,这将是这两种方法的计划:

SELECT BUSINESS_STATUS_NAME, 
     SUM(PENDING_ITEMS) as PENDING_ITEMS, 
     DATAGROUP 
FROM PAYMENTS 
GROUP BY ROLLUP ((BUSINESS_STATUS_NAME, DATAGROUP)) 

--------------------------------------------------------------------------------- 
| Id | Operation   | Name  | Rows | Bytes | Cost (%CPU)| Time  | 
--------------------------------------------------------------------------------- 
| 0 | SELECT STATEMENT  |   |  2 | 62 |  4 (25)| 00:00:01 | 
| 1 | SORT GROUP BY ROLLUP|   |  2 | 62 |  4 (25)| 00:00:01 | 
| 2 | TABLE ACCESS FULL | PAYMENTS |  2 | 62 |  3 (0)| 00:00:01 | 
--------------------------------------------------------------------------------- 

SELECT BUSINESS_STATUS_NAME, 
     PENDING_ITEMS, 
     DATAGROUP 
FROM PAYMENTS 
UNION ALL 
SELECT NULL, 
     SUM(PENDING_ITEMS), 
     NULL 
FROM PAYMENTS  

-------------------------------------------------------------------------------- 
| Id | Operation   | Name  | Rows | Bytes | Cost (%CPU)| Time  | 
-------------------------------------------------------------------------------- 
| 0 | SELECT STATEMENT |   |  3 | 75 |  6 (50)| 00:00:01 | 
| 1 | UNION-ALL   |   |  |  |   |   | 
| 2 | TABLE ACCESS FULL | PAYMENTS |  2 | 62 |  3 (0)| 00:00:01 | 
| 3 | SORT AGGREGATE |   |  1 | 13 |   |   | 
| 4 | TABLE ACCESS FULL| PAYMENTS |  2 | 26 |  3 (0)| 00:00:01 | 
-------------------------------------------------------------------------------- 

ROLLUP仍然有一个表扫描一个更好的计划。

+0

在比较计划时需要注意的重要一点不是“成本”(应该只针对**相同**查询的不同执行计划进行比较,而不是针对解决相同问题的两种不同查询,两者都是正确的但使用不同的方法)。需要注意的是'union all'需要两次访问基表** **。尽管Gordon的观点相反(在另一个答案中),但这几乎肯定会使'union all'查询比'rollup'查询更慢(并且可能慢得多)。 – mathguy

+0

感谢您的解释。我同意你的成本,但请记住,你的结果集并不是我想要显示数据的方式......看起来汇总需要进一步的分组,这毫无意义......我只需要盛大的所有行的总数,不需要进一步分组(考虑到初始分组已经执行)。这有意义吗? –

+0

@Rob_E:我不明白这一点。鉴于我的样本表,结果应该是什么? – Aleksej