这里是我的版本。我真的只是把它作为一种好奇心来表达,以展示另一种思考问题的方式。事实证明它比这更有用,因为它甚至比马丁史密斯酷炫的“群岛”解决方案的表现还要好。但是,一旦他摆脱了一些过于昂贵的聚合窗口功能,并且做了真正的聚合,他的查询开始踢屁股。
解决方案1:运行3个月或更长时间,通过检查前后1个月并使用半连接来完成。
WITH Months AS (
SELECT DISTINCT
O.CustID,
Grp = DateDiff(Month, '20000101', O.OrderDate)
FROM
CustOrder O
), Anchors AS (
SELECT
M.CustID,
Ind = M.Grp + X.Offset
FROM
Months M
CROSS JOIN (
SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1
) X (Offset)
GROUP BY
M.CustID,
M.Grp + X.Offset
HAVING
Count(*) = 3
)
SELECT
C.CustName,
[Year] = Year(OrderDate),
O.OrderDate
FROM
Cust C
INNER JOIN CustOrder O ON C.CustID = O.CustID
WHERE
EXISTS (
SELECT 1
FROM
Anchors A
WHERE
O.CustID = A.CustID
AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
)
ORDER BY
C.CustName,
OrderDate;
解决方案2:精确3个月的图案。如果是4个月或更长时间的运行,则排除这些值。这是通过检查前2个月和后两个月(基本上寻找模式N,Y,Y,Y,N)完成的。
WITH Months AS (
SELECT DISTINCT
O.CustID,
Grp = DateDiff(Month, '20000101', O.OrderDate)
FROM
CustOrder O
), Anchors AS (
SELECT
M.CustID,
Ind = M.Grp + X.Offset
FROM
Months M
CROSS JOIN (
SELECT -2 UNION ALL SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2
) X (Offset)
GROUP BY
M.CustID,
M.Grp + X.Offset
HAVING
Count(*) = 3
AND Min(X.Offset) = -1
AND Max(X.Offset) = 1
)
SELECT
C.CustName,
[Year] = Year(OrderDate),
O.OrderDate
FROM
Cust C
INNER JOIN CustOrder O ON C.CustID = O.CustID
INNER JOIN Anchors A
ON O.CustID = A.CustID
AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
ORDER BY
C.CustName,
OrderDate;
这里是我的表加载脚本,如果别人想打:
IF Object_ID('CustOrder', 'U') IS NOT NULL DROP TABLE CustOrder
IF Object_ID('Cust', 'U') IS NOT NULL DROP TABLE Cust
GO
SET NOCOUNT ON
CREATE TABLE Cust (
CustID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED,
CustName varchar(100) UNIQUE
)
CREATE TABLE CustOrder (
OrderID int identity(100, 1) NOT NULL PRIMARY KEY CLUSTERED,
CustID int NOT NULL FOREIGN KEY REFERENCES Cust (CustID),
OrderDate smalldatetime NOT NULL
)
DECLARE @i int
SET @i = 1000
WHILE @i > 0 BEGIN
WITH N AS (
SELECT
Nm =
Char(Abs(Checksum(NewID())) % 26 + 65)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
)
INSERT Cust
SELECT N.Nm
FROM N
WHERE NOT EXISTS (
SELECT 1
FROM Cust C
WHERE
N.Nm = C.CustName
)
SET @i = @i - @@RowCount
END
WHILE @i < 50000 BEGIN
INSERT CustOrder
SELECT TOP (50000 - @i)
Abs(Checksum(NewID())) % 1000 + 1,
DateAdd(Day, Abs(Checksum(NewID())) % 10000, '19900101')
FROM master.dbo.spt_values
SET @i = @i + @@RowCount
END
性能
这里有一些性能测试结果为3个月或更多的查询:
Query CPU Reads Duration
Martin 1 2297 299412 2348
Martin 2 625 285 809
Denis 3641 401 3855
Erik 1855 94727 2077
这只是一次运行每个,但数字是相当具有代表性的。事实证明,你的查询并不是那么糟糕,毕竟,丹尼斯。马丁的查询击败了其他人,但起初他使用了一些他固定的过于昂贵的窗口功能策略。
当然,正如我所指出的,当客户在同一天有两个订单时,丹尼斯的查询不会拉动正确的行,所以他的查询不存在争用,除非他是固定的。
此外,不同的指数可能会改变事情。我不知道。
如果将'113,13-AUG-2007,1'行添加到订单表中,您希望输出什么? AA的输出块有4行或两个输出块,每行包含3行?如果您愿意,是否“一次严格三个月”或“一次三个月以上”。 – 2010-09-19 00:40:00
对不起,我比较喜欢三个月 – Gopi 2010-09-20 15:22:57
你的意思是说一个4个月的字符串会返回6行,一个是第1,2,3个月,另一个是第2,3,4个月,或者只是排除所有不完全是3个月的订单? – ErikE 2010-09-20 17:04:06