2012-12-22 62 views
2

MySQL似乎无法使用GROUP BY子查询来优化选择,并且以较长的执行时间结束。对于这种常见的情况必须有一个已知的优化。与左右连接组合的MySQL子查询 - 优化

假设我们试图从数据库返回所有订单,并带有一个标志,指示它是否是客户的第一笔订单。

CREATE TABLE orders (order int, customer int, date date); 

检索客户的第一个订单是超快。

SELECT customer, min(order) as first_order FROM orders GROUP BY customer; 

然而,一旦我们使用子查询

SELECT order, first_order FROM orders LEFT JOIN ( 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer 
) AS first_orders ON orders.order=first_orders.first_order; 

我希望有我们缺少一个简单的一招加入这个与全单组就变得很慢,因为否则的话将约1000倍快做

CREATE TEMPORARY TABLE tmp_first_order AS 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer; 
CREATE INDEX tmp_boost ON tmp_first_order (first_order) 

SELECT order, first_order FROM orders LEFT JOIN tmp_first_order 
    ON orders.order=tmp_first_order.first_order; 

编辑
通过@ruakh启发提出d选项3,使用INNER JOINUNION确实有一个不太难看的解决方法,它具有可接受的性能,但不需要临时表。但是,这是有点特定于我们的情况,我想知道是否存在更通用的优化。

SELECT order, "YES" as first FROM orders INNER JOIN ( 
    SELECT min(order) as first_order FROM orders GROUP BY customer 
) AS first_orders_1 ON orders.order=first_orders_1.first_order 
UNION 
SELECT order, "NO" as first FROM orders INNER JOIN ( 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer 
) AS first_orders_2 ON first_orders_2.customer = orders.customer 
    AND orders.order > first_orders_2.first_order; 
+0

几个思路:分析执行计划(解释查询);指数;子查询而不是左连接。 –

+0

克里斯托克斯,你检查我的答案吗? –

回答

3

这里有一些事情你可以试试:

  1. 去除子查询的字段列表customer,因为它没有做任何事情反正:

    SELECT order, 
         first_order 
        FROM orders 
        LEFT 
        JOIN (SELECT MIN(order) AS first_order 
          FROM orders 
          GROUP 
          BY customer 
         ) AS first_orders 
        ON orders.order = first_orders.first_order 
    ; 
    
  2. 相反,添加customerON条款,所以它实际上为您做了一些事情:

    SELECT order, 
         first_order 
        FROM orders 
        LEFT 
        JOIN (SELECT customer, 
           MIN(order) AS first_order 
          FROM orders 
          GROUP 
          BY customer 
         ) AS first_orders 
        ON orders.customer = first_orders.customer 
        AND orders.order = first_orders.first_order 
    ; 
    
  3. 与以前相同,但使用INNER JOIN代替LEFT JOIN的,和你原来的ON条款转换为CASE表达:

    SELECT order, 
         CASE WHEN first_order = order THEN first_order END AS first_order 
        FROM orders 
    INNER 
        JOIN (SELECT customer, 
           MIN(order) AS first_order 
          FROM orders 
          GROUP 
          BY customer 
         ) AS first_orders 
        ON orders.customer = first_orders.customer 
    ; 
    
  4. 与不相关IN -subquery更换整个JOIN方法在CASE表达式中:

    SELECT order, 
         CASE WHEN order IN 
            (SELECT MIN(order) 
             FROM orders 
            GROUP 
             BY customer 
           ) 
          THEN order 
         END AS first_order 
        FROM orders 
    ; 
    
  5. 与相关EXISTS -subquery更换整个JOIN方法在CASE表达:

    SELECT order, 
         CASE WHEN NOT EXISTS 
            (SELECT 1 
             FROM orders AS o2 
            WHERE o2.customer = o1.customer 
             AND o2.order < o1.order 
           ) 
          THEN order 
         END AS first_order 
        FROM orders AS o1 
    ; 
    

(这很可能是上面的一些将实际执行糟糕,但我觉得他们都值得尝试。)

+0

真棒答案... –

+0

好答案@ruakh。选项3很有趣,但在您的示例中,它只会返回第一个订单。即如果您有100个客户和2000个订单,那么这只会返回100个第一个订单。受到你的建议的启发,我尝试了一些似乎可行的'UNION'。 – kristox

+0

@kristox:Re:“如果你有100个客户和2000个订单,那么[选项3]将只返回100个第一个订单”:这不是事实。你确定你正确地复制了'ON'子句吗? – ruakh

1

我希望使用一个变量,而不是离开时,这是更快的连接:

SELECT 
    `order`, 
    If(@previous_customer<>(@previous_customer:=`customer`), 
    `order`, 
    NULL 
) AS first_order 
FROM orders 
JOIN (SELECT @previous_customer := -1) x 
ORDER BY customer, `order`; 

这是我对SQL Fiddle回报什么例子:

CUSTOMER ORDER FIRST_ORDER 
1   1  1 
1   2  (null) 
1   3  (null) 
2   4  4 
2   5  (null) 
3   6  6 
4   7  7 
+0

[MySQL参考手册*的第9.4节](http://dev.mysql.com/doc/refman/5.6/en/user-variables.html)建议不要“为用户变量赋值”并在相同的语句中读取该值“,理由是您无法保证它总是能够提供您期望的结果(在更改MySQL版本,更改执行计划等情况下)。 – ruakh