我有一个超过一亿行的巨大表格,我必须查询此表格才能在最短的时间内返回一组数据。在阅读巨大表格时的性能调整
所以我创建了一个测试环境,这个表的定义:
CREATE TABLE [dbo].[Test](
[Dim1ID] [nvarchar](20) NOT NULL,
[Dim2ID] [nvarchar](20) NOT NULL,
[Dim3ID] [nvarchar](4) NOT NULL,
[Dim4ID] [smalldatetime] NOT NULL,
[Dim5ID] [nvarchar](20) NOT NULL,
[Dim6ID] [nvarchar](4) NOT NULL,
[Dim7ID] [nvarchar](4) NOT NULL,
[Dim8ID] [nvarchar](4) NOT NULL,
[Dim9ID] [nvarchar](4) NOT NULL,
[Dim10ID] [nvarchar](4) NOT NULL,
[Dim11ID] [nvarchar](20) NOT NULL,
[Value] [decimal](21, 6) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[Dim1ID] ASC,
[Dim2ID] ASC,
[Dim3ID] ASC,
[Dim4ID] ASC,
[Dim5ID] ASC,
[Dim6ID] ASC,
[Dim7ID] ASC,
[Dim8ID] ASC,
[Dim9ID] ASC,
[Dim10ID] ASC,
[Dim11ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
此表是星型模式架构(事实上/尺寸)的事实表。正如你所看到的,除了“Value”列之外,我在所有列上都有一个聚集索引。
我已经用大约填充了这些数据。 10,000,000行用于测试目的。碎片率目前为0.01%。
我想使用此查询从该表中读取的行集时提高性能:
DECLARE @Dim1ID nvarchar(20) = 'C1'
DECLARE @Dim9ID nvarchar(4) = 'VRT1'
DECLARE @Dim10ID nvarchar(4) = 'S1'
DECLARE @Dim6ID nvarchar(4) = 'FRA'
DECLARE @Dim7ID nvarchar(4) = '' -- empty = all
DECLARE @Dim8ID nvarchar(4) = '' -- empty = all
DECLARE @Dim2 TABLE (Dim2ID nvarchar(20) NOT NULL)
INSERT INTO @Dim2 VALUES ('A1'), ('A2'), ('A3'), ('A4');
DECLARE @Dim3 TABLE (Dim3ID nvarchar(4) NOT NULL)
INSERT INTO @Dim3 VALUES ('P1');
DECLARE @Dim4ID TABLE (Dim4ID smalldatetime NOT NULL)
INSERT INTO @Dim4ID VALUES ('2009-01-01'), ('2009-01-02'), ('2009-01-03');
DECLARE @Dim11 TABLE (Dim11ID nvarchar(20) NOT NULL)
INSERT INTO @Dim11 VALUES ('Var0001'), ('Var0040'), ('Var0060'), ('Var0099')
SELECT RD.Dim2ID,
RD.Dim3ID,
RD.Dim4ID,
RD.Dim5ID,
RD.Dim6ID,
RD.Dim7ID,
RD.Dim8ID,
RD.Dim9ID,
RD.Dim10ID,
RD.Dim11ID,
RD.Value
FROM dbo.Test RD
INNER JOIN @Dim2 R
ON RD.Dim2ID = R.Dim2ID
INNER JOIN @Dim3 C
ON RD.Dim3ID = C.Dim3ID
INNER JOIN @Dim4ID P
ON RD.Dim4ID = P.Dim4ID
INNER JOIN @Dim11 V
ON RD.Dim11ID = V.Dim11ID
WHERE RD.Dim1ID = @Dim1ID
AND RD.Dim9ID = @Dim9ID
AND ((@Dim6ID <> '' AND RD.Dim6ID = @Dim6ID) OR @Dim6ID = '')
AND ((@Dim7ID <> '' AND RD.Dim7ID = @Dim7ID) OR @Dim7ID = '')
AND ((@Dim8ID <>'' AND RD.Dim8ID = @Dim8ID) OR @Dim8ID = '')
我已经测试过该查询和所返回180行这些时间: 1日执行: 1分32秒;第二次执行:1分钟。
如果可能,我想在几秒钟内返回数据。
我想我可以添加非聚集索引,但我不知道最好的方法是设置非聚集索引! 如果在此表中排序了订单数据可以提高绩效? 还是有其他解决方案比索引?
谢谢。
脑融化。什么是变量命名? – 2011-03-28 16:26:36
为了保密,我宁愿不要输入真实姓名。但我想我可以写出虚拟名字。 – Dan 2011-03-28 16:45:38