2012-03-14 66 views
0

我正在制作留言板,并试图检索常规主题(即未粘贴主题),并按最后发布的消息日期对其进行排序。但是,当我有大约10,000条消息和1500个主题时,我能够完成此任务,查询时间大于60秒。使用多连接和大记录集时的SQL查询优化

我的问题是,有什么我可以做我的查询来提高性能或是我的设计从根本上有缺陷?

这是我正在使用的查询。

SELECT Messages.topic_id, 
     Messages.posted, 
     Topics.title, 
     Topics.user_id, 
     Users.username 
FROM Messages 
LEFT JOIN 
    Topics USING(topic_id) 
LEFT JOIN 
    Users on Users.user_id = Topics.user_id 
WHERE Messages.message_id IN (
    SELECT MAX(message_id) 
    FROM Messages 
    GROUP BY topic_id) 
AND Messages.topic_id 
NOT IN (
    SELECT topic_id 
    FROM StickiedTopics) 
AND Messages.posted IN (       
    SELECT MIN(posted) 
    FROM Messages 
    GROUP BY message_id) 
AND Topics.board_id=1 
ORDER BY Messages.posted DESC LIMIT 50 

编辑这里是解释计划

+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+ 
| id | select_type  | table   | type   | possible_keys | key  | key_len | ref      | rows | Extra          | 
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+ 
| 1 | PRIMARY   | Topics   | ref   | PRIMARY,board_id | board_id | 4  | const     | 641 | Using where; Using temporary; Using filesort | 
| 1 | PRIMARY   | Users   | eq_ref   | PRIMARY   | PRIMARY | 4  | spergs3.Topics.user_id | 1 |            | 
| 1 | PRIMARY   | Messages  | ref   | topic_id   | topic_id | 4  | spergs3.Topics.topic_id | 3 | Using where         | 
| 4 | DEPENDENT SUBQUERY | Messages  | index   | NULL    | PRIMARY | 8  | NULL     | 1 |            | 
| 3 | DEPENDENT SUBQUERY | StickiedTopics | index_subquery | topic_id   | topic_id | 4  | func     | 1 | Using index         | 
| 2 | DEPENDENT SUBQUERY | Messages  | index   | NULL    | topic_id | 4  | NULL     | 3 | Using index         | 
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+ 

指标

+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
| Messages |   0 | PRIMARY |   1 | message_id | A   |  9956 |  NULL | NULL |  | BTREE  |   | 
| Messages |   0 | PRIMARY |   2 | revision_no | A   |  9956 |  NULL | NULL |  | BTREE  |   | 
| Messages |   1 | user_id |   1 | user_id  | A   |   432 |  NULL | NULL |  | BTREE  |   | 
| Messages |   1 | topic_id |   1 | topic_id | A   |  3318 |  NULL | NULL |  | BTREE  |   | 
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 

+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
| Topics |   0 | PRIMARY |   1 | topic_id | A   |  1205 |  NULL | NULL |  | BTREE  |   | 
| Topics |   1 | user_id |   1 | user_id  | A   |   133 |  NULL | NULL |  | BTREE  |   | 
| Topics |   1 | board_id |   1 | board_id | A   |   1 |  NULL | NULL |  | BTREE  |   | 
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 

+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
| Table | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | 
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
| Users |   0 | PRIMARY   |   1 | user_id  | A   |  2051 |  NULL | NULL |  | BTREE  |   | 
| Users |   0 | username_UNIQUE |   1 | username | A   |  2051 |  NULL | NULL |  | BTREE  |   | 
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 
+0

您至少应该发布解释计划。 – EvilTeach 2012-03-14 02:07:32

+0

您使用的是什么RDBMS? – squillman 2012-03-14 02:09:47

+0

没有查询计划=没有优化(尽管可能猜测应该存在的索引,哪些*也需要在问题中显示)。你有没有考虑用适当的'JOIN'替换'IN'? (我不使用MySQL,所以我不知道它是什么: - /) – 2012-03-14 02:14:39

回答

2

我会从合格主题的第一个基础开始,获取这些ID,然后再加入。 我的内部第一个查询按照topic_id和max消息进行了预先限定的分组,以获取预先限定的不同ID。我也已将LEFT JOIN应用于stickiesTopics。为什么?通过进行左连接,我可以查找找到的那些(您想要排除的那些)。所以我已经为Stickies应用了WHERE子句主题ID为NULL(即:未找到)。所以通过这样做,我们已经明确地将列表配对了,而不需要做几个嵌套的子查询。从这个结果中,我们可以加入消息,主题(包括board_id = 1的限定符),用户并根据需要获取部分。最后,为您的MIN(发布)限定符应用一个WHERE IN子选择。不了解其基础,但将其作为原始查询的一部分留下。然后按顺序和限制。

SELECT STRAIGHT_JOIN 
     M.topic_id, 
     M.posted, 
     T.title, 
     T.user_id, 
     U.username 
    FROM 
     (select 
       M1.Topic_ID, 
       MAX(M1.Message_id) MaxMsgPerTopic 
      from 
       Messages M1 
       LEFT Join StickiedTopics ST 
        ON M1.Topic_ID = ST.Topic_ID 
      where 
       ST.Topic_ID IS NULL 
      group by 
       M1.Topic_ID) PreQuery 
     JOIN Messages M 
      ON PreQuery.MaxMsgPerTopic = M.Message_ID 
      JOIN Topics T 
       ON M.Topic_ID = T.Topic_ID 
       AND T.Board_ID = 1 
       LEFT JOIN Users U 
       on T.User_ID = U.user_id 
    WHERE 
     M.posted IN (SELECT MIN(posted) 
         FROM Messages 
         GROUP BY message_id) 
    ORDER BY 
     M.posted DESC 
    LIMIT 50 
+0

这很好,非常感谢!它将查询时间缩短到20秒,并将硬件增加到了4点。好得多! – Drew 2012-03-16 04:51:32

1

我猜你的问题的很大一部分就在于你的子查询。尝试这样的:

SELECT Messages.topic_id, 
     Messages.posted, 
     Topics.title, 
     Topics.user_id, 
     Users.username 
FROM Messages 
LEFT JOIN 
    Topics USING(topic_id) 
LEFT JOIN 
    StickiedTopics ON StickiedTopics.topic_id = Topics.topic_id 
        AND StickedTopics.topic_id IS NULL 
LEFT JOIN 
    Users on Users.user_id = Topics.user_id 
WHERE Messages.message_id IN (
    SELECT MAX(message_id) 
    FROM Messages m1 
    WHERE m1.topic_id = Messages.topic_id) 
AND Messages.posted IN (       
    SELECT MIN(posted)                       
    FROM Messages m2 
    GROUP BY message_id) 
AND Topics.board_id=1 
ORDER BY Messages.posted DESC LIMIT 50 

我优化了第一个子查询通过删除分组。第二个子查询是不必要的,因为它可以替换为JOIN

我不太清楚这是什么第三子查询是应该做的:

AND Messages.posted IN (       
    SELECT MIN(posted)                       
    FROM Messages m2 
    GROUP BY message_id) 

我也许能帮助优化,如果我知道什么是应该做的。 posted究竟是什么 - 日期,整数等?它代表什么?

+0

Messages.posted是一个unix时间戳。留言板还支持编辑并保留修订历史记录,以便此查询将获取最早修订的日期。 – Drew 2012-03-16 04:53:54