2010-09-18 73 views
8

请帮我生成以下查询。说我有客户桌子和订单表。TSQL查找连续3个月发生的订单

客户表

CustID CustName 

1  AA  
2  BB 
3  CC 
4  DD 

顺序表

OrderID OrderDate   CustID 
100  01-JAN-2000  1 
101  05-FEB-2000  1  
102  10-MAR-2000  1 
103  01-NOV-2000  2  
104  05-APR-2001  2 
105  07-MAR-2002  2 
106  01-JUL-2003  1 
107  01-SEP-2004  4 
108  01-APR-2005  4 
109  01-MAY-2006  3 
110  05-MAY-2007  1 
111  07-JUN-2007  1 
112  06-JUL-2007  1 

我想找出谁对连续3个月取得订单的客户。 (允许使用SQL Server 2005和2008进行查询)。

所需的输出是:

CustName  Year OrderDate 

    AA  2000 01-JAN-2000  
    AA  2000 05-FEB-2000 
    AA  2000 10-MAR-2000 

    AA  2007 05-MAY-2007   
    AA  2007 07-JUN-2007   
    AA  2007 06-JUL-2007   
+0

如果将'113,13-AUG-2007,1'行添加到订单表中,您希望输出什么? AA的输出块有4行或两个输出块,每行包含3行?如果您愿意,是否“一次严格三个月”或“一次三个月以上”。 – 2010-09-19 00:40:00

+0

对不起,我比较喜欢三个月 – Gopi 2010-09-20 15:22:57

+0

你的意思是说一个4个月的字符串会返回6行,一个是第1,2,3个月,另一个是第2,3,4个月,或者只是排除所有不完全是3个月的订单? – ErikE 2010-09-20 17:04:06

回答

7

编辑:摆脱或MAX() OVER (PARTITION BY ...)作为,似乎杀死性能。

;WITH cte AS ( 
SELECT CustID , 
      OrderDate, 
      DATEPART(YEAR, OrderDate)*12 + DATEPART(MONTH, OrderDate) AS YM 
FROM  Orders 
), 
cte1 AS ( 
SELECT CustID , 
      OrderDate, 
      YM, 
      YM - DENSE_RANK() OVER (PARTITION BY CustID ORDER BY YM) AS G 
FROM  cte 
), 
cte2 As 
(
SELECT CustID , 
      MIN(OrderDate) AS Mn, 
      MAX(OrderDate) AS Mx 
FROM cte1 
GROUP BY CustID, G 
HAVING MAX(YM)-MIN(YM) >=2 
) 
SELECT  c.CustName, o.OrderDate, YEAR(o.OrderDate) AS YEAR 
FROM   Customers AS c INNER JOIN 
         Orders AS o ON c.CustID = o.CustID 
INNER JOIN cte2 c2 ON c2.CustID = o.CustID and o.OrderDate between Mn and Mx 
order by c.CustName, o.OrderDate 
+1

需要在三个月内使用DENSE_RANK或四个+销售量将被忽略。 – 2010-09-18 22:09:55

+1

完美的群岛解决方案... – ErikE 2010-09-20 16:20:01

+0

马丁,我测试了您的查询,并没有给出正确的结果... – ErikE 2010-09-20 20:05:11

1

在这里你去:

select distinct 
CustName 
,year(OrderDate) [Year] 
,OrderDate 
from 
(
select 
o2.OrderDate [prev] 
,o1.OrderDate [curr] 
,o3.OrderDate [next] 
,c.CustName 
from [order] o1 
join [order] o2 on o1.CustId = o2.CustId and datediff(mm, o2.OrderDate, o1.OrderDate) = 1 
join [order] o3 on o1.CustId = o3.CustId and o2.OrderId <> o3.OrderId and datediff(mm, o3.OrderDate, o1.OrderDate) = -1 
join Customer c on c.CustId = o1.CustId 
) t 
unpivot 
(
    OrderDate for [DateName] in ([prev], [curr], [next]) 
) 
unpvt 
order by CustName, OrderDate 
+0

警告:此查询效率极低。 :) – 2010-09-18 22:58:40

+0

丹尼斯,我很抱歉地报告,当同一客户在同一天有两个订单时,此查询不会返回正确的结果。 – ErikE 2010-09-20 22:09:16

+0

@Emtucifor,我知道!但我们不知道@CSharpy需要什么! :) – 2010-09-21 06:37:28

4

这里是我的版本。我真的只是把它作为一种好奇心来表达,以展示另一种思考问题的方式。事实证明它比这更有用,因为它甚至比马丁史密斯酷炫的“群岛”解决方案的表现还要好。但是,一旦他摆脱了一些过于昂贵的聚合窗口功能,并且做了真正的聚合,他的查询开始踢屁股。

解决方案1:运行3个月或更长时间,通过检查前后1个月并使用半连接来完成。

WITH Months AS (
    SELECT DISTINCT 
     O.CustID, 
     Grp = DateDiff(Month, '20000101', O.OrderDate) 
    FROM 
     CustOrder O 
), Anchors AS (
    SELECT 
     M.CustID, 
     Ind = M.Grp + X.Offset 
    FROM 
     Months M 
     CROSS JOIN (
     SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 
    ) X (Offset) 
    GROUP BY 
     M.CustID, 
     M.Grp + X.Offset 
    HAVING 
     Count(*) = 3 
) 
SELECT 
    C.CustName, 
    [Year] = Year(OrderDate), 
    O.OrderDate 
FROM 
    Cust C 
    INNER JOIN CustOrder O ON C.CustID = O.CustID 
WHERE 
    EXISTS (
     SELECT 1 
     FROM 
     Anchors A 
     WHERE 
     O.CustID = A.CustID 
     AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201') 
     AND O.OrderDate < DateAdd(Month, A.Ind, '20000301') 
    ) 
ORDER BY 
    C.CustName, 
    OrderDate; 

解决方案2:精确3个月的图案。如果是4个月或更长时间的运行,则排除这些值。这是通过检查前2个月和后两个月(基本上寻找模式N,Y,Y,Y,N)完成的。

WITH Months AS (
    SELECT DISTINCT 
     O.CustID, 
     Grp = DateDiff(Month, '20000101', O.OrderDate) 
    FROM 
     CustOrder O 
), Anchors AS (
    SELECT 
     M.CustID, 
     Ind = M.Grp + X.Offset 
    FROM 
     Months M 
     CROSS JOIN (
     SELECT -2 UNION ALL SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 
    ) X (Offset) 
    GROUP BY 
     M.CustID, 
     M.Grp + X.Offset 
    HAVING 
     Count(*) = 3 
     AND Min(X.Offset) = -1 
     AND Max(X.Offset) = 1 
) 
SELECT 
    C.CustName, 
    [Year] = Year(OrderDate), 
    O.OrderDate 
FROM 
    Cust C 
    INNER JOIN CustOrder O ON C.CustID = O.CustID 
    INNER JOIN Anchors A 
     ON O.CustID = A.CustID 
     AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201') 
     AND O.OrderDate < DateAdd(Month, A.Ind, '20000301') 
ORDER BY 
    C.CustName, 
    OrderDate; 

这里是我的表加载脚本,如果别人想打:

IF Object_ID('CustOrder', 'U') IS NOT NULL DROP TABLE CustOrder 
IF Object_ID('Cust', 'U') IS NOT NULL DROP TABLE Cust 
GO 
SET NOCOUNT ON 
CREATE TABLE Cust (
    CustID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED, 
    CustName varchar(100) UNIQUE 
) 

CREATE TABLE CustOrder (
    OrderID int identity(100, 1) NOT NULL PRIMARY KEY CLUSTERED, 
    CustID int NOT NULL FOREIGN KEY REFERENCES Cust (CustID), 
    OrderDate smalldatetime NOT NULL 
) 

DECLARE @i int 
SET @i = 1000 
WHILE @i > 0 BEGIN 
    WITH N AS (
     SELECT 
     Nm = 
      Char(Abs(Checksum(NewID())) % 26 + 65) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
    ) 
    INSERT Cust 
    SELECT N.Nm 
    FROM N 
    WHERE NOT EXISTS (
     SELECT 1 
     FROM Cust C 
     WHERE 
     N.Nm = C.CustName 
    ) 

    SET @i = @i - @@RowCount 
END 
WHILE @i < 50000 BEGIN 
    INSERT CustOrder 
    SELECT TOP (50000 - @i) 
     Abs(Checksum(NewID())) % 1000 + 1, 
     DateAdd(Day, Abs(Checksum(NewID())) % 10000, '19900101') 
    FROM master.dbo.spt_values 
    SET @i = @i + @@RowCount 
END 

性能

这里有一些性能测试结果为3个月或更多的查询:

Query  CPU Reads Duration 
Martin 1 2297 299412 2348 
Martin 2 625 285 809 
Denis  3641 401 3855 
Erik  1855 94727 2077 

这只是一次运行每个,但数字是相当具有代表性的。事实证明,你的查询并不是那么糟糕,毕竟,丹尼斯。马丁的查询击败了其他人,但起初他使用了一些他固定的过于昂贵的窗口功能策略。

当然,正如我所指出的,当客户在同一天有两个订单时,丹尼斯的查询不会拉动正确的行,所以他的查询不存在争用,除非他是固定的。

此外,不同的指数可能会改变事情。我不知道。

+0

不要让我再添加两个连接到我的解决方案,它已经是三维的。 :P – 2010-09-20 21:22:57

+0

你需要更新你的表现图! – 2010-09-20 23:57:25

+1

完成。为了表明并非所有的窗口函数操作都非常棒,我将这些统计信息留在旧版本中。不加区别地使用它们会伤害性能。 – ErikE 2010-09-21 00:19:40

0

这是我的要求。

select 100 as OrderID,convert(datetime,'01-JAN-2000') OrderDate, 1 as CustID into #tmp union 
    select 101,convert(datetime,'05-FEB-2000'),  1 union 
    select 102,convert(datetime,'10-MAR-2000'),  1 union 
    select 103,convert(datetime,'01-NOV-2000'),  2 union 
    select 104,convert(datetime,'05-APR-2001'),  2 union 
    select 105,convert(datetime,'07-MAR-2002'),  2 union 
    select 106,convert(datetime,'01-JUL-2003'),  1 union 
    select 107,convert(datetime,'01-SEP-2004'),  4 union 
    select 108,convert(datetime,'01-APR-2005'),  4 union 
    select 109,convert(datetime,'01-MAY-2006'),  3 union 
    select 110,convert(datetime,'05-MAY-2007'),  1 union 
    select 111,convert(datetime,'07-JUN-2007'),  1 union 
    select 112,convert(datetime,'06-JUL-2007'),  1 


    ;with cte as 
    (
     select 
      * 
      ,convert(int,convert(char(6),orderdate,112)) - dense_rank() over(partition by custid order by orderdate) as g 
     from #tmp 
    ), 
    cte2 as 
    (
    select 
     CustID 
     ,g 
    from cte a 
    group by CustID, g 
    having count(g)>=3 
    ) 
    select 
     a.CustID 
     ,Yr=Year(OrderDate) 
     ,OrderDate 
    from cte2 a join cte b 
     on a.CustID=b.CustID and a.g=b.g