2011-04-14 52 views
0

这里搜索行的查询:如何优化MySQL查询,在一个特定的日期范围

select timespans.id as timespan_id, count(*) as num 
from reports, timespans 
where timespans.after_date >= '2011-04-13 22:08:38' and 
     timespans.after_date <= reports.authored_at and 
     reports.authored_at < timespans.before_date 
group by timespans.id; 

下面是表DEFS:

 
CREATE TABLE `reports` (
    `id` int(11) NOT NULL auto_increment, 
    `source_id` int(11) default NULL, 
    `url` varchar(255) default NULL, 
    `lat` decimal(20,15) default NULL, 
    `lng` decimal(20,15) default NULL, 
    `content` text, 
    `notes` text, 
    `authored_at` datetime default NULL, 
    `created_at` datetime default NULL, 
    `updated_at` datetime default NULL, 
    `data` text, 
    `title` varchar(255) default NULL, 
    `author_id` int(11) default NULL, 
    `orig_id` varchar(255) default NULL, 
    PRIMARY KEY (`id`), 
    KEY `index_reports_on_title` (`title`), 
    KEY `index_content_on_reports` (`content`(128)) 

CREATE TABLE `timespans` (
    `id` int(11) NOT NULL auto_increment, 
    `after_date` datetime default NULL, 
    `before_date` datetime default NULL, 
    `after_offset` int(11) default NULL, 
    `before_offset` int(11) default NULL, 
    `is_common` tinyint(1) default NULL, 
    `created_at` datetime default NULL, 
    `updated_at` datetime default NULL, 
    `is_search_chunk` tinyint(1) default NULL, 
    `is_day` tinyint(1) default NULL, 
    PRIMARY KEY (`id`), 
    KEY `index_timespans_on_after_date` (`after_date`), 
    KEY `index_timespans_on_before_date` (`before_date`) 

这里是解释:

 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+ 
| id | select_type | table  | type | possible_keys            | key       | key_len | ref | rows | Extra          | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+ 
| 1 | SIMPLE  | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9  | NULL |  84 | Using where; Using temporary; Using filesort | 
| 1 | SIMPLE  | reports | ALL | NULL               | NULL       | NULL | NULL | 183297 | Using where         | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+----------------------------------------------+ 

这里是我在authored_at上创建索引后的解释。正如你可以看到,指数实际上并不习惯(我认为...)

 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+ 
| id | select_type | table  | type | possible_keys            | key       | key_len | ref | rows | Extra           | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+ 
| 1 | SIMPLE  | timespans | range | index_timespans_on_after_date,index_timespans_on_before_date | index_timespans_on_after_date | 9  | NULL |  86 | Using where; Using temporary; Using filesort | 
| 1 | SIMPLE  | reports | ALL | index_reports_on_authored_at         | NULL       | NULL | NULL | 183317 | Range checked for each record (index map: 0x8) | 
+----+-------------+-----------+-------+--------------------------------------------------------------+-------------------------------+---------+------+--------+------------------------------------------------+ 

大约有142K行的报告表,并要少得多的时间跨度表。

查询大约需要3秒钟。

奇怪的是,如果我在reports.authored_at上添加索引,它实际上会使查询速度变慢,大约20秒。我原以为它会做相反的事情,因为它可以很容易地在范围的任何一端查找报告,并将剩余的报告扔掉,而不必检查所有报告。

有人能澄清?我很难过。

+3

请把你的解释结果和你的表格定义,tkx – Neo 2011-04-14 04:48:37

+0

真的应该有'reports.authored_at'上的索引。 EXPLAIN在该列被索引后说什么? – Wiseguy 2011-04-14 05:10:07

回答

1

而不是两个单独的时间表索引,尝试将它们合并到单个索引中具有before_date和after_date的单个多列索引。然后将该索引添加到authored_at中。

1

我重写你这样的查询:表

select t.id, count(*) as num from timespans t 
    join reports r where t.after_date >= '2011-04-13 22:08:38' 
    and r.authored_at >= '2011-04-13 22:08:38' 
    and r.authored_at < t.before_date 
group by t.id order by null; 

和变化指标

alter table reports add index authored_at_idx(authored_at); 
+0

令人惊叹!尽管r.authored at应该与t.after_date比较,而不是字面值。但是这绝对可以解决它。就我所能看到的唯一真正的区别是比较的方向,将r.authored_at放在左侧使其更快。我不知道这有什么不同! – user707270 2011-04-14 12:45:24

+0

@ user707270你能想到的是,MySQL是没有那么聪明,知道t.authored> = t.after_date是相同r.authored> =“2011-04-13 22点08分38秒” – Neo 2011-04-14 14:52:48

0

您可以after_date柱使用的数据库的分区功能。它会帮助你很多。