2010-06-04 63 views
2

我需要从已知集合中选择每个类别的顶行(与this question有点类似)。问题是,如何使这个查询在大量的行上有效。为集合中的每个类别有效选择顶行

例如,我们创建一个表格,在几个地方存储温度记录。

CREATE TABLE #t (
    placeId int, 
    ts datetime, 
    temp int, 
    PRIMARY KEY (ts, placeId) 
) 

-- insert some sample data 

SET NOCOUNT ON 

DECLARE @n int, @ts datetime 
SELECT @n = 1000, @ts = '2000-01-01' 

WHILE (@n>0) BEGIN 
    INSERT INTO #t VALUES (@n % 10, @ts, @n % 37) 
    IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts) 
    SET @n = @n - 1 
END 

现在我需要获得最新的记录每个地区1,2,3

这种方式是有效的,但不能很好地(而且看上去脏)。

SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 

以下看起来更好但效率低得多(根据优化器,30%vs 70%)。

SELECT placeId, ts, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 

的问题是,在#T执行聚集索引扫描和300个检索行,分类,编号,然后过滤,只留下3行后者查询执行计划中。对于前一个查询,三次获取一行。

有没有办法有效地执行查询没有大量的联合?

+0

包含示例代码+1的问题 – 2010-06-04 15:17:59

回答

1

我装100,000行(这仍然不是足以减慢速度),尝试了老式的方式:

select t.* 
from #t t 
    inner join (select placeId, max(ts) ts 
       from #t 
       where placeId in (1,2,3) 
       group by placeId) xx 
    on xx.placeId = t.placeId 
    and xx.ts = t.ts 

并得到了很多相同的结果。

然后我扭转了索引中的列的顺序,以

CREATE TABLE #t ( 
    placeId int, 
    ts datetime, 
    temp int, 
    PRIMARY KEY (placeId, ts) 
) 

,并在所有的查询,减少了页面读取和指数寻求而不是扫描。

如果优化是你的目标,你可以修改索引,我修改了主键,或者添加了一个覆盖索引。

+0

谢谢,我不知何故错过了“老式的方式”。它对我的实际数据结构也起到更好的作用。 – VladV 2010-06-07 06:47:10

2

不只是看执行计划还看statistics iostatistics time

set statistics io on 
go 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 

SELECT placeId, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 

set statistics io off 
go 

表 '#t000000000B99'。扫描计数3,逻辑读取6,物理读取0,预读读取0,lob逻辑读取0,lob物理读取0,lob预读读取0. 表'#t000000000B99'。扫描计数1,逻辑读取6次,物理读取0,预读0,lob逻辑读取0,lob物理读取0次,lob预读0

set statistics time on 
go 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 

SELECT placeId, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 

set statistics time on 
go 

对我来说,有没有真正的区别2种方法,加载了更多的数据,当您将它下降到40%和60%这两个查询添加订单也再次比较

SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 
ORDER BY placeId 

SELECT placeId, temp FROM (
    SELECT placeId, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 
ORDER BY placeId 
0

只是为了记录,另一个选项使用CROSS APPLY。
在我的配置上,它的性能比以前提到的要好。

SELECT * 
FROM (VALUES (1),(2),(3)) t (placeId) 
CROSS APPLY (
    SELECT TOP 1 ts, temp 
    FROM #t 
    WHERE placeId = t.placeId 
    ORDER BY ts DESC 
) tt 

我猜,VALUES可能被chaged到临时表或表变量没有太大的区别。