2015-03-08 83 views
6

编辑:在看了一些这里的答案和研究小时后,我的团队得出结论,最有可能没有办法进一步优化这个比我们能够实现的4.5秒(除非在offering_clicks上进行分区,但会产生一些丑陋的副作用)。最终,经过大量的头脑风暴,我们决定拆分这两个查询,创建两组用户标识(一个来自用户表,另一个来自offers_clicks),并将它们与Python中的set进行比较。用户表格中的ids集合仍然是从SQL中提取的,但我们决定将offering_clicks移动到Lucene,并在其上添加一些缓存,这就是现在从其中拖拽另一组ID的位置。最终的结果是,它的缓存下降到大约半秒,没有缓存的时间为0.9秒。优化慢MySQL选择查询

原始帖子的开头:我无法获得优化的查询。查询的第一个版本没有问题,但在第二个查询中加入了offering_clicks,查询变得相当慢。用户表包含1000万行,offers_clicks包含5300万行。

可接受的性能:

SELECT count(distinct(users.id)) AS count_1 
FROM users USE index (country_2) 
WHERE users.country = 'US' 
    AND users.last_active > '2015-02-26'; 
1 row in set (0.35 sec) 

坏:

SELECT count(distinct(users.id)) AS count_1 
FROM offers_clicks USE index (user_id_3), users USE index (country_2) 
WHERE users.country = 'US' 
    AND users.last_active > '2015-02-26' 
    AND offers_clicks.user_id = users.id 
    AND offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24; 
1 row in set (7.39 sec) 

这里是它的外观没有specificying任何索引(甚至更糟):

SELECT count(distinct(users.id)) AS count_1 
FROM offers_clicks, users 
WHERE users.country IN ('US') 
    AND users.last_active > '2015-02-26' 
    AND offers_clicks.user_id = users.id 
    AND offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24; 
1 row in set (17.72 sec) 

解释:

explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+ 
| id | select_type | table   | type | possible_keys | key  | key_len | ref       | rows | Extra     | 
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+ 
| 1 | SIMPLE  | users   | range | country_2  | country_2 | 14  | NULL       | 245014 | Using where; Using index | 
| 1 | SIMPLE  | offers_clicks | ref | user_id_3  | user_id_3 | 4  | dejong_pointstoshop.users.id | 270153 | Using where; Using index | 
+----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+ 

解释不指定任何索引:

mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ 
| id | select_type | table   | type | possible_keys               | key  | key_len | ref       | rows | Extra     | 
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ 
| 1 | SIMPLE  | users   | range | PRIMARY,last_active,country,last_active_2,country_2     | country_2 | 14  | NULL       | 221606 | Using where; Using index | 
| 1 | SIMPLE  | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4  | dejong_pointstoshop.users.id |  3 | Using where    | 
+----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ 

这里的指标我没有太多的成功尝试了一大堆:

+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 
| Table   | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | 
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 
| offers_clicks |   1 | user_id_3     |   1 | user_id   | A   |   198 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_3     |   2 | ranking_score | A   |   198 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_3     |   3 | date   | A   |   198 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_2     |   1 | user_id   | A   | 17838712 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_2     |   2 | date   | A   | 53516137 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_4     |   1 | user_id   | A   |   198 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_4     |   2 | date   | A   |   198 |  NULL | NULL |  | BTREE  |   |    | 
| offers_clicks |   1 | user_id_4     |   3 | ranking_score | A   |   198 |  NULL | NULL |  | BTREE  |   |    | 
| users   |   1 | country_2     |   1 | country   | A   |   14 |  NULL | NULL |  | BTREE  |   |    | 
| users   |   1 | country_2     |   2 | last_active  | A   |  8048529 |  NULL | NULL |  | BTREE  |   |    | 

简化用户模式:

+---------------------------------+---------------+------+-----+---------------------+----------------+ 
| Field       | Type   | Null | Key | Default    | Extra   | 
+---------------------------------+---------------+------+-----+---------------------+----------------+ 
| id        | int(11)  | NO | PRI | NULL    | auto_increment | 
| country       | char(2)  | NO | MUL |      |    | 
| last_active      | datetime  | NO | MUL | 2000-01-01 00:00:00 |    | 

简体提供点击架构:

+-----------------+------------------+------+-----+---------------------+----------------+ 
| Field   | Type    | Null | Key | Default    | Extra   | 
+-----------------+------------------+------+-----+---------------------+----------------+ 
| id    | int(11)   | NO | PRI | NULL    | auto_increment | 
| user_id   | int(11)   | NO | MUL | 0     |    | 
| offer_id  | int(11) unsigned | NO | MUL | NULL    |    | 
| date   | datetime   | NO | MUL | 0000-00-00 00:00:00 |    | 
| ranking_score | decimal(5,2)  | NO | MUL | 0.00    |    | 
+1

请张贴您的模式! – 2015-03-08 21:33:52

+0

Eugen Rieck;完成了! – 2015-03-09 02:34:27

+1

请注意,DISTINCT不是功能 – Strawberry 2015-03-12 07:39:15

回答

5

这是您的查询:

SELECT count(distinct u.id) AS count_1 
FROM offers_clicks oc JOIN 
    users u 
    ON oc.user_id = u.id 
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND 
     oc.date > '2015-02-14' AND 
     oc.ranking_score > 0.24 AND oc.ranking_score < 3.49; 

第一,而不是count(distinct),你可能会考虑写查询为:

SELECT count(*) AS count_1 
FROM users u 
WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND 
     EXISTS (SELECT 1 
       FROM offers_clicks oc 
       WHERE oc.user_id = u.id AND 
        oc.date > '2015-02-14' AND 
        oc.ranking_score > 0.24 AND oc.ranking_score < 3.49 
      ) 

然后,该查询的最佳指标是:users(country, last_active, id),要么offers_clicks(user_id, date, ranking_score)offers_clicks(user_id, ranking_score, date)

+0

我与用户(country,last_active)和offers_clicks(user_id,date,ranking_score)一起尝试了此操作。速度大致相同。 1排(6.45秒)。 id在用户表的复合索引中有多重要?我想了解如何影响查询。我可以尝试明天在(country,last_active和id)上添加一个索引,看看它是如何影响事物的。 – 2015-03-09 02:20:48

+0

你可以使用'='US''而不是'in'来试试这个查询吗?这可能会阻止优化使用索引。 'user_id'不是那*重要的。它只是允许索引成为覆盖索引,所以引擎不必从数据页面获取数据。 – 2015-03-09 02:24:47

+0

谢谢戈登;我会尝试在用户表格的明日复合索引中添加“id”。我早些时候也尝试过'美国';似乎没有任何区别(没有完全基准测试,但速度看起来差不多)差异很大。 – 2015-03-09 02:31:45

0
SELECT count(users.id) AS count_1 
FROM users 
INNER JOIN 
    (SELECT 
    DISTINCT user_id 
    FROM 
    offers_clicks 
    WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24 
) as clicks 
ON clicks.user_id = users.id 
WHERE users.country IN ('US') 
    AND users.last_active > '2015-02-26' 

请问您可以提供sqlfiddle一些数据吗?

,你能告诉我什么是执行时间的查询:

SELECT 
    DISTINCT user_id 
    FROM 
    offers_clicks 
    WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24 

编辑问题 多久需要这个吗?

SELECT 
    DISTINCT user_id 
    FROM 
    offers_clicks USE INDEX (user_id_4) 
    WHERE offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24 
+0

明天我会尝试安装sqlfiddle。仅仅offers_clicks的执行时间大约为4-5秒,几乎与包括用户在内的查询(其运行时间约5-6秒,比原始查询快大约1-2秒)一样慢。 – 2015-03-12 21:19:54

+0

以下是关于offering_clicks query btw的解释: 1 | SIMPLE | offers_clicks |范围|日期,date_2,date_3,ranking_score | date_2 | 8 | NULL | 2738102 |在哪里使用;使用临时| – 2015-03-12 21:49:34

+0

但它会带来正确的结果吗?比以前(17-18)更好(5-6)?所以现在我只需要改进它就可以少于1秒? – Alex 2015-03-13 14:14:47

1
SELECT count(distinct u.id) AS count_1 
FROM users u 
STRAIGHT_JOIN offers_clicks oc 
    ON oc.user_id = u.id 
WHERE 
    u.country IN ('US') 
    AND u.last_active > '2015-02-26' 
    AND oc.date > '2015-02-14' 
    AND oc.ranking_score > 0.24 
    AND oc.ranking_score < 3.49; 

确保你的用户有指数 - (idlast_activecountry)列 和offers_clicks - (user_iddateranking_score

或者你也可以颠倒顺序

SELECT count(distinct u.id) AS count_1 
FROM offers_clicks oc 
STRAIGHT_JOIN users u 
    ON oc.user_id = u.id 
WHERE 
    u.country IN ('US') 
    AND u.last_active > '2015-02-26' 
    AND oc.date > '2015-02-14' 
    AND oc.ranking_score > 0.24 
    AND oc.ranking_score < 3.49; 

请确保您索引的offers_clicks - (user_id)列 和用户 - (idlast_activecountry

0

尝试围绕这样做的其他方式:

SELECT COUNT(users.id) 
    FROM users, offers_clicks 
    WHERE users.country = 'US' 
     AND users.last_active > '2015-02-26' 
     AND offers_clicks.user_id = users.id 
     AND offers_clicks.date > '2015-02-14' 
     AND offers_clicks.ranking_score < 3.49 
     AND offers_clicks.ranking_score > 0.24; 
0

试试这个:

SELECT count(distinct users.id) AS count_1 
FROM users USE index (<see below>) 
JOIN offers_clicks USE index (<see below>) 
    ON offers_clicks.user_id = users.id 
    AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE 
    AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49 
WHERE users.country = 'US' 
AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE 

确保有指标上users(country, last_active, id)和他们是offers_clicks(user_id, ranking_score, date)USE

让我知道它是如何执行的,如果它有效,我会解释原因。

0

首先,我还认为你应该使用连接,并尝试只连接你真正需要的结果。
至于表offering_clicks我认为你不应该使用索引user_id_3并使用user_id_2 因为user_id_2的基数高于user_id_3的基数(相应于您的索引) 它应该更快。

SELECT 
    count(distinct(users.id)) AS count_1 
FROM users USE INDEX (country_2) 
JOIN offers_clicks USE INDEX (user_id_2) 
    ON offers_clicks.user_id = users.id 
    AND offers_clicks.date > '2015-02-14' 
    AND offers_clicks.ranking_score < 3.49 
    AND offers_clicks.ranking_score > 0.24 
WHERE users.country = 'US' AND users.last_active > '2015-02-26' 
; 

对于此查询,您不需要更改表格,这就是为什么我认为您可以尝试它。
也许会有助于尝试减少日期范围,结果减少结果中的行数应该更快。

不确定我会帮忙...