2011-06-08 57 views
0

今天注意到一个SQL查询,这是非常长的,我的mysql-的slow.log如何优化或正确写入这个MYSQL查询?

我想请教一些专家SQL如何正确格式化和执行该SQL。

sql背后的想法: 从2个表中返回所有不在mailchimp表中的电子邮件,并且仅返回DISTINCT值(用户和订阅者电子邮件可能会重复)。还包括城市和语言结果。

正如你所看到的query_time是怪物长,检查行只是跆拳道组合2表应该只有大约20K行。

Query_time: 113.216544 Lock_time: 0.000180 Rows_sent: 43 Rows_examined: 208280841 

SELECT * FROM 
    (SELECT u.email AS email, u.city, u.language FROM users AS u 
     LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email 
     UNION SELECT s.email AS email, s.city, s.language FROM subscribers AS s 
     LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email) 
    AS sync GROUP BY sync.email ORDER BY sync.email ASC; 

解释查询

+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ 
| id | select_type | table  | type | possible_keys | key | key_len | ref | rows | Extra       | 
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ 
| 1 | PRIMARY  | <derived2> | ALL | NULL   | NULL | NULL | NULL | 23 | Using temporary; Using filesort | 
| 2 | DERIVED  | u   | ALL | NULL   | NULL | NULL | NULL | 10482 | Using temporary; Using filesort | 
| 2 | DERIVED  | m   | ALL | NULL   | NULL | NULL | NULL | 11411 | Using where; Not exists   | 
| 3 | UNION  | s   | ALL | NULL   | NULL | NULL | NULL | 2709 | Using temporary; Using filesort | 
| 3 | UNION  | m   | ALL | NULL   | NULL | NULL | NULL | 11411 | Using where; Not exists   | 
| NULL | UNION RESULT | <union2,3> | ALL | NULL   | NULL | NULL | NULL | NULL |         | 
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ 
6 rows in set (2 min 1.65 sec) 
+4

子选择,工会,和顺序,哦,我的! – 2011-06-08 15:00:06

+0

你可以将这些作为两个不同的查询并在代码中进行排序吗?可能会更快 – Ascherer 2011-06-08 15:00:49

+0

对于该查询发布EXPLAIN也是如此。 – jishi 2011-06-08 15:10:02

回答

1

我猜你在三张桌子上没有索引。在所有3个表上的字段email上添加索引; userssubscribersmailchimp并再次运行查询 - 和EXPLAIN - 。

您的疑问:

SELECT * 
FROM 
    (SELECT u.email AS email, u.city, u.language 
    FROM users AS u 
     LEFT JOIN mailchimp AS m 
     ON u.email = m.email 
     WHERE m.email IS NULL 
     GROUP BY u.email 
    UNION 
    SELECT s.email AS email, s.city, s.language 
    FROM subscribers AS s 
    LEFT JOIN mailchimp AS m 
     ON s.email = m.email 
    WHERE m.email IS NULL 
    GROUP BY s.email 
) 
    AS sync 
GROUP BY sync.email 
ORDER BY sync.email ASC; 

可以写成这样(去掉两个内部GROUP BY和车削UNIONUNION ALL):

SELECT * 
FROM 
    (SELECT u.email AS email, u.city, u.language 
    FROM users AS u 
     LEFT JOIN mailchimp AS m 
     ON u.email = m.email 
     WHERE m.email IS NULL 
    UNION ALL 
    SELECT s.email AS email, s.city, s.language 
    FROM subscribers AS s 
    LEFT JOIN mailchimp AS m 
     ON s.email = m.email 
    WHERE m.email IS NULL 
) 
    AS sync 
GROUP BY sync.email 
ORDER BY sync.email ASC; 

或像这样(转动LEFT JOIN - check IS NULLNOT EXISTS)有时更快:

SELECT * 
FROM 
    (SELECT u.email AS email, u.city, u.language 
    FROM users AS u 
    WHERE NOT EXISTS 
     (SELECT * 
     FROM mailchimp AS m 
     WHERE u.email = m.email 
    ) 
    UNION ALL 
    SELECT s.email AS email, s.city, s.language 
    FROM subscribers AS s 
    WHERE NOT EXISTS 
     (SELECT * 
     FROM mailchimp AS m 
     WHERE s.email = m.email 
    ) 
) 
    AS sync 
GROUP BY sync.email 
ORDER BY sync.email ASC; 

无论如何,增加索引email字段!

+0

感谢大家。现在它是'6行(0.15秒)' – arma 2011-06-08 15:40:48

+0

@arma:请注意,如果一个电子邮件同时在用户和订户中,或者在这些表格中有多次,但这些行有不同的城市或'语言'存储,只有一个城市和语言会显示。如果要显示多行,请删除“GROUP BY”。 – 2011-06-08 15:46:17

+0

是的,我明白我应该重写代码,那里的电子邮件应该是唯一的,并删除php检查。再次感谢从2m15sec到0.15sec的大幅修复。 – arma 2011-06-08 15:51:42

1

这是否帮助不搞乱你的结果?我添加了UNION ALL,一个普通的UNION由于您在外部查询中进行分组,因此是一种浪费的循环。

SELECT * FROM 
    (SELECT u.email AS email, u.city, u.language FROM users AS u 
     LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email 
     UNION ALL 
     SELECT s.email AS email, s.city, s.language FROM subscribers AS s 
     LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email) 
    AS sync GROUP BY sync.email ORDER BY sync.email ASC; 
3

注意在解释计划中没有可用的键。这会使表现糟糕。对于每个用户记录,您必须扫描整个mailchimp表。然后,对于每个用户记录,您将扫描整个mailchimp表。你做约10482 * 11411 + 2709 * 11411读取。

也许MySQL专家可以在这里发出响声,但据我了解MySQL文档,它不像其他一些数据库引擎那样执行哈希匹配。一切都是循环和匹配。

您可以通过在mailchimp.email上创建索引来显着提高性能。