MySQL如何实现“group by”？

我从MySQL参考手册中读到，发现它可以使用索引时，它只是执行索引扫描，其他它将创建tmp表并执行像filesort这样的操作。而且我还从其他文章中读到，“分组依据”结果将默认按列分组，如果添加了“by order by null”子句，则不会执行文件分类。区别可以从“explain ...”子句中找到。所以我的问题是：什么是区别“group by”从句与“order by null”并没有？ 我尝试使用剖析，看看MySQL在后台做的，只能看到结果，如：MySQL如何实现“group by”？

result for group clause without order by null: 
|preparing      | 0.000016 | 
| Creating tmp table    | 0.000048 | 
| executing      | 0.000009 | 
| Copying to tmp table   | 0.000109 | 
**| Sorting result     | 0.000023 |** 
| Sending data     | 0.000027 | 

result for clause with "order by null": 
preparing      | 0.000016 | 
| Creating tmp table    | 0.000052 | 
| executing      | 0.000009 | 
| Copying to tmp table   | 0.000114 | 
| Sending data     | 0.000028 |

所以我猜的MySQL办时，“秩序由空”补充说，它不使用文件排序算法，也许当它创建tmp表时，它也使用索引，然后使用索引通过操作来完成分组，当完成时，它只是从表行中读取结果并且不对结果进行排序。

但我最初的意见是，MySQL可以使用quicksort对项目进行排序，然后按group by排序，因此结果也会被排序。

任何意见赞赏，谢谢。

来源

2010-03-17 user188916

-1

分组由一些列分组记录。例如，您有列“类”，您可以按此列进行分组，以便根据此列值获取分组记录。

来源

2010-03-17 09:10:25 werd

mysql> select max(post_date),post_author from wp_posts 
-> where id > 10 and id < 1000 
-> group by post_author; 
+———————+————-+ 
| max(post_date) | post_author | 
+———————+————-+ 
| 2009-07-03 12:58:39 | 1 | 
+———————+————-+ 
1 row in set (0.01 sec) 

mysql> show profiles; 
+———-+————+————————+ 
| Query_ID | Duration | Query | 
+———-+————+————————+ 
| 1 | 0.00013200 | SELECT DATABASE() | 
| 2 | 0.00030900 | show databases | 
| 3 | 0.00030400 | show tables | 
| 4 | 0.01180000 | select max(post_date),post_author from wp_posts where id > 10 and id < 1000 group by post_author |4 rows in set (0.00 sec) 

mysql> show profile cpu,block io for query 4; 
+———————-+———-+———-+————+————–+—————+ 
| Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | 
+———————-+———-+———-+————+————–+—————+ 
| starting | 0.000085 | 0.000000 | 0.000000 | 0 | 0 | 
| Opening tables | 0.000010 | 0.000000 | 0.000000 | 0 | 0 | 
| System lock | 0.000005 | 0.000000 | 0.000000 | 0 | 0 | 
| Table lock | 0.000008 | 0.000000 | 0.000000 | 0 | 0 | 
| init | 0.000029 | 0.000000 | 0.000000 | 0 | 0 | 
| optimizing | 0.000014 | 0.000000 | 0.000000 | 0 | 0 | 
| statistics | 0.000062 | 0.000000 | 0.000000 | 0 | 0 | 
| preparing | 0.000016 | 0.000000 | 0.000000 | 0 | 0 | 
| Creating tmp table | 0.000035 | 0.000000 | 0.000000 | 0 | 0 | 
| executing | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| Copying to tmp table | 0.011386 | 0.004999 | 0.006999 | 0 | 0 | 
| Sorting result | 0.000044 | 0.000000 | 0.000000 | 0 | 0 | 
| Sending data | 0.000036 | 0.000000 | 0.000000 | 0 | 0 | 
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| removing tmp table | 0.000012 | 0.000000 | 0.000000 | 0 | 0 | 
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| query end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| freeing items | 0.000013 | 0.000000 | 0.000000 | 0 | 0 | 
| closing tables | 0.000018 | 0.000000 | 0.000000 | 0 | 0 | 
| logging slow query | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | 
| cleaning up | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
+———————-+———-+———-+————+————–+—————+ 
22 rows in set (0.00 sec) 

mysql> 
mysql> 
mysql> select max(post_date),post_author from wp_posts 
-> where id > 10 and id < 1000 
-> group by post_author order by null; 
+———————+————-+ 
| max(post_date) | post_author | 
+———————+————-+ 
| 2009-07-03 12:58:39 | 1 | 
+———————+————-+ 
1 row in set (0.01 sec) 

mysql> show profiles; 
+———-+————+—————–+ 
| Query_ID | Duration | Query 
+———-+————+—————–+ 
|1 | 0.00013200 | SELECT DATABASE() 
|2 | 0.00030900 | show databases 
|3 | 0.00030400 | show tables 
|4 | 0.01180000 | select max(post_date),post_author from wp_posts where id > 10 and id < 1000 group by post_author 
|5 | 0.01177700 | select max(post_date),post_author from wp_posts where id > 10 and id < 1000 group by post_author order by null 
5 rows in set (0.00 sec) 
mysql> show profile cpu,block io for query 5; 
+———————-+———-+———-+————+————–+—————+ 
| Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | 
+———————-+———-+———-+————+————–+—————+ 
| starting | 0.000097 | 0.000000 | 0.000000 | 0 | 0 | 
| Opening tables | 0.000013 | 0.000000 | 0.000000 | 0 | 0 | 
| System lock | 0.000006 | 0.000000 | 0.000000 | 0 | 0 | 
| Table lock | 0.000008 | 0.000000 | 0.000000 | 0 | 0 | 
| init | 0.000032 | 0.000000 | 0.000000 | 0 | 0 | 
| optimizing | 0.000012 | 0.000000 | 0.000000 | 0 | 0 | 
| statistics | 0.000065 | 0.000000 | 0.000000 | 0 | 0 | 
| preparing | 0.000017 | 0.000000 | 0.000000 | 0 | 0 | 
| Creating tmp table | 0.000040 | 0.000000 | 0.000000 | 0 | 0 | 
| executing | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | 
| Copying to tmp table | 0.011369 | 0.005999 | 0.004999 | 0 | 0 | 
| Sending data | 0.000040 | 0.000000 | 0.000000 | 0 | 0 | 
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| removing tmp table | 0.000031 | 0.000000 | 0.000000 | 0 | 0 | 
| end | 0.000005 | 0.000000 | 0.000000 | 0 | 0 | 
| end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| query end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | 
| freeing items | 0.000012 | 0.000000 | 0.000000 | 0 | 0 | 
| closing tables | 0.000009 | 0.000000 | 0.000000 | 0 | 0 | 
| logging slow query | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | 
| cleaning up | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | 
+———————-+———-+———-+————+————–+—————+ 
21 rows in set (0.00 sec)

从这里我们可以看到，第二部分没有“排序结果”步骤，所以对性能有一点影响。

来源

2010-03-18 02:00:26 oyishi

好吧，我已经在上面提到过了。我真正想知道的是mysql执行“group by”操作的方式。 – user188916 2010-03-18 03:16:38

GROUP BY子句允许使用WITH ROLLUP修饰符，该修饰符会将多余的行添加到摘要输出中。这些行代表更高级别（或超级聚合）摘要操作。因此，ROLLUP允许您使用单个查询在多个分析级别回答问题。例如，它可用于为OLAP（联机分析处理）操作提供支持。

假设名为销售表中有年份，国家，产品和利润记录销售利润率列：

CREATE TABLE销售（年INT NOT NULL，国家VARCHAR（20）NOT NULL， product VARCHAR（32）NOT NULL， profit INT ）;

表的内容，每年可归纳用一个简单的GROUP BY这样的：

的mysql> SELECT年，SUM（利润）销售GROUP BY年; + ------ + ------------- + |年| | SUM（利润）| + ------ + ------------- + | 2000 | 4525 | | 2001 | 3010 | + ------ + ------------- +

此输出显示每年的总利润，但是如果您还想确定总计的总利润总和年，您必须自己添加个别值或运行其他查询。

或者您可以使用ROLLUP，它通过单个查询提供两种级别的分析。将WITH ROLLUP修饰符添加到GROUP BY子句会导致查询生成另一行，显示所有年份值的总和：

mysql> SELECT year，SUM（profit）FROM sales GROUP BY year WITH ROLLUP; + ------ + ------------- + |年| | SUM（利润）| + ------ + ------------- + | 2000 | 4525 | | 2001 | 3010 | | NULL | 7535 | + ------ + ------------- +

总的超级聚合线由年份列中的NULL值标识。

当有多个GROUP BY列时，ROLLUP会有更复杂的效果。在这种情况下，每当除了最后一个分组列以外的任何一个“中断”（更改值）时，查询都会生成一个额外的超级聚合摘要行。

例如，如果没有ROLLUP的基础上，今年，国家和产品的销售表的总结可能是这样的：

的mysql> SELECT年份，国家，产品，SUM（利润） - > FROM销售 - > GROUP BY年份，国家，产品; + ------ + --------- + ------------ + ------------- + |年| |国家|产品| SUM（利润）| + ------ + --------- + ------------ + ------------- + | 2000 |芬兰|计算机| 1500 | | 2000 |芬兰|电话| 100 | | 2000 |印度|计算器| 150 | | 2000 |印度|计算机| 1200 | | 2000 | USA |计算器| 75 | | 2000 | USA |计算机| 1500 | | 2001 |芬兰|电话| 10 | | 2001 | USA |计算器| 50 | | 2001 | USA |计算机| 2700 | | 2001 | USA |电视| 250 | + ------ + --------- + ------------ + ------------- +

输出结果仅显示年份/国家/产品分析级别的汇总值。当添加汇总是，查询产生几种额外行：

的mysql> SELECT年份，国家，产品，SUM（利润） - >销售 - > GROUP BY年，国家，ROLLUP产品; + ------ + --------- + ------------ + ------------- + |年| |国家|产品| SUM（利润）| + ------ + --------- + ------------ + ------------- + | 2000 |芬兰|计算机| 1500 | | 2000 |芬兰|电话| 100 | | 2000 |芬兰| NULL | 1600 | | 2000 |印度|计算器| 150 | | 2000 |印度|计算机| 1200 | | 2000 |印度| NULL | 1350 | | 2000 | USA |计算器| 75 | | 2000 | USA |计算机| 1500 | | 2000 | USA | NULL | 1575 | | 2000 | NULL | NULL | 4525 | | 2001 |芬兰|电话| 10 | | 2001 |芬兰| NULL | 10 | | 2001 | USA |计算器| 50 | | 2001 | USA |计算机| 2700 | | 2001 | USA |电视| 250 | | 2001 | USA | NULL | 3000 | | 2001 | NULL | NULL | 3010 | | NULL | NULL | NULL | 7535 | + ------ + --------- + ------------ + ------------- +

对于此查询，添加ROLLUP会导致输出包含四个分析级别的摘要信息，而不仅仅是一个。下面是如何解释ROLLUP输出：

* 

    Following each set of product rows for a given year and country, an extra summary row is produced showing the total for all products. These rows have the product column set to NULL. 
* 

    Following each set of rows for a given year, an extra summary row is produced showing the total for all countries and products. These rows have the country and products columns set to NULL. 
* 

    Finally, following all other rows, an extra summary row is produced showing the grand total for all years, countries, and products. This row has the year, country, and products columns set to NULL.

其他注意事项使用ROLLUP

下列项目列表一些具体到MySQL执行ROLLUP的行为：

当您使用ROLLUP，你不能还使用ORDER BY子句对结果进行排序。换句话说，ROLLUP和ORDER BY是互斥的。但是，您仍然可以对排序顺序进行一些控制。 MySQL中的GROUP BY对结果进行排序，并且可以使用明确的ASC和DESC关键字以及GROUP BY列表中指定的列来指定单个列的排序顺序。（无论排序顺序如何，ROLLUP添加的更高级汇总行仍会出现在计算它们的行后面）。

LIMIT可用于限制返回给客户端的行数。 LIMIT在ROLLUP之后应用，因此该限制适用于ROLLUP添加的额外行。例如：

的mysql> SELECT年份，国家，产品，SUM（利润） - >销售 - > GROUP BY年，国家，ROLLUP产品 - > LIMIT 5; + ------ + --------- + ------------ + ------------- + |年| |国家|产品| SUM（利润）| + ------ + --------- + ------------ + ------------- + | 2000 |芬兰|计算机| 1500 | | 2000 |芬兰|电话| 100 | | 2000 |芬兰| NULL | 1600 | | 2000 |印度|计算器| 150 | | 2000 |印度|计算机| 1200 | + ------ + --------- + ------------ + ------------- +

将LIMIT与ROLLUP一起使用可能会产生更难解释的结果，因为您理解超集合行的上下文较少。

当将行发送到客户端时，会生成每个超集合行中的NULL指示符。服务器查看最后一个已更改值的GROUP BY子句中指定的列。对于结果集中具有与任何这些名称进行词法匹配的名称的任何列，其值都设置为NULL。（如果按列编号指定分组列，则服务器通过编号标识要设置为NULL的列。）

由于超集合行中的NULL值在查询的后期阶段放入结果集中处理，您不能在查询本身内将它们作为NULL值进行测试。例如，您不能将HAVING产品IS NULL添加到查询中，以从输出中除去超集合行以外的所有行。

另一方面，NULL值在客户端显示为NULL，可以使用任何MySQL客户端编程接口进行测试。

来源

2010-03-18 05:54:17

MySQL如何实现“group by”？

回答

相关问题