2012-02-14 59 views
8

我有表以下数据结构在SQL Server:创建的满足给定的标准连续三天组

ID Date  Allocation 
1, 2012-01-01, 0 
2, 2012-01-02, 2 
3, 2012-01-03, 0 
4, 2012-01-04, 0 
5, 2012-01-05, 0 
6, 2012-01-06, 5 

我需要做的是让所有连续两天时间,其中分配= 0,并且在下面的形式:

Start Date End Date  DayCount 
2012-01-01 2012-01-01 1 
2012-01-03 2012-01-05 3 

它是POSS能够在SQL中做到这一点,如果是的话如何?

+0

@ istari是结束日期您的表结构中的一列 – Devjosh 2012-02-14 10:09:04

+0

您尝试过使用光标吗?或者您不需要游标 – Vikram 2012-02-14 10:31:31

+0

您是指“间隔一天”中的“连续”,还是指“当行按日期排序时邻近”?即每个唯一的日期是否在'日期'列中恰好出现一次? – gcbenison 2012-02-14 13:40:25

回答

3

在这个答案,我会假设“ID”场数的连续当通过增加日期,就像在本例中的数据不排序的列。 (如果该列不存在,则可以创建这样的列)。

这是描述的技术herehere的一个例子。

1)在相邻的“id”值上加入表格。这对相邻的行。选择“分配”字段已更改的行。将结果存储在临时表中,同时保持运行索引。

SET @idx = 0; 
CREATE TEMPORARY TABLE boundaries 
SELECT 
    (@idx := @idx + 1) AS idx, 
    a1.date AS prev_end, 
    a2.date AS next_start, 
    a1.allocation as allocation 
FROM allocations a1 
JOIN allocations a2 
ON (a2.id = a1.id + 1) 
WHERE a1.allocation != a2.allocation; 

这使您具有“下一个周期的开始”,并且在每行“在上期‘分配’的价值”,“前一段时间的尽头”的表:

+------+------------+------------+------------+ 
| idx | prev_end | next_start | allocation | 
+------+------------+------------+------------+ 
| 1 | 2012-01-01 | 2012-01-02 |   0 | 
| 2 | 2012-01-02 | 2012-01-03 |   2 | 
| 3 | 2012-01-05 | 2012-01-06 |   0 | 
+------+------------+------------+------------+ 

2)我们需要在同一行中每个周期的开始和结束,所以我们需要再次组合相邻的行。通过创建像boundaries第二临时表,但有一个idx场做到这一点1时:

+------+------------+------------+ 
| idx | prev_end | next_start | 
+------+------------+------------+ 
| 2 | 2012-01-01 | 2012-01-02 | 
| 3 | 2012-01-02 | 2012-01-03 | 
| 4 | 2012-01-05 | 2012-01-06 | 
+------+------------+------------+ 

现在加入的idx领域,我们得到的答案是:

SELECT 
    boundaries2.next_start AS start, 
    boundaries.prev_end AS end, 
    allocation 
FROM boundaries 
JOIN boundaries2 
USING(idx); 

+------------+------------+------------+ 
| start  | end  | allocation | 
+------------+------------+------------+ 
| 2012-01-02 | 2012-01-02 |   2 | 
| 2012-01-03 | 2012-01-05 |   0 | 
+------------+------------+------------+ 

**请注意,这个答案正确地获得“内部”期间,但错过了开始时分配= 0且结束时分配= 5的两个“边缘”期间。这些可以使用UNION条款拉入,但我想提出没有这种并发症的核心思想。

0

没有CTE A液:

SELECT a.aDate AS StartDate 
    , MIN(c.aDate) AS EndDate 
    , (datediff(day, a.aDate, MIN(c.aDate)) + 1) AS DayCount 
FROM (
    SELECT x.aDate, x.allocation, COUNT(*) idn FROM table1 x 
    JOIN table1 y ON y.aDate <= x.aDate 
    GROUP BY x.id, x.aDate, x.allocation 
) AS a 
LEFT JOIN (
    SELECT x.aDate, x.allocation, COUNT(*) idn FROM table1 x 
    JOIN table1 y ON y.aDate <= x.aDate 
    GROUP BY x.id, x.aDate, x.allocation 
) AS b ON a.idn = b.idn + 1 AND b.allocation = a.allocation 
LEFT JOIN (
    SELECT x.aDate, x.allocation, COUNT(*) idn FROM table1 x 
    JOIN table1 y ON y.aDate <= x.aDate 
    GROUP BY x.id, x.aDate, x.allocation 
) AS c ON a.idn <= c.idn AND c.allocation = a.allocation 
LEFT JOIN (
    SELECT x.aDate, x.allocation, COUNT(*) idn FROM table1 x 
    JOIN table1 y ON y.aDate <= x.aDate 
    GROUP BY x.id, x.aDate, x.allocation 
) AS d ON c.idn = d.idn - 1 AND d.allocation = c.allocation 
WHERE b.idn IS NULL AND c.idn IS NOT NULL AND d.idn IS NULL AND a.allocation = 0 
GROUP BY a.aDate 

Example

+0

运行此时,我收到以下错误信息: 消息530,级别16,状态1,行1 声明终止。在声明c – Istari 2012-02-14 11:17:19

3

下面将做这件事。该解决方案的要点是

  • 使用CTE让所有连续起动和enddates的列表,Allocation = 0
  • 使用ROW_NUMBER窗函数分配取决于双方开始和enddates rownumbers。
  • 只选择那些记录,既ROW_NUMBERS等于1
  • 使用DATEDIFF计算DayCount

SQL语句

;WITH r AS (
    SELECT StartDate = Date, EndDate = Date 
    FROM YourTable 
    WHERE Allocation = 0 
    UNION ALL 
    SELECT r.StartDate, q.Date 
    FROM r 
      INNER JOIN YourTable q ON DATEDIFF(dd, r.EndDate, q.Date) = 1 
    WHERE q.Allocation = 0   
) 
SELECT [Start Date] = s.StartDate 
     , [End Date ] = s.EndDate 
     , [DayCount] = DATEDIFF(dd, s.StartDate, s.EndDate) + 1 
FROM (
      SELECT * 
        , rn1 = ROW_NUMBER() OVER (PARTITION BY StartDate ORDER BY EndDate DESC) 
        , rn2 = ROW_NUMBER() OVER (PARTITION BY EndDate ORDER BY StartDate ASC) 
      FROM r   
     ) s 
WHERE s.rn1 = 1 
     AND s.rn2 = 1 
OPTION (MAXRECURSION 0) 

测试脚本

;WITH q (ID, Date, Allocation) AS (
    SELECT * FROM (VALUES 
    (1, '2012-01-01', 0) 
    , (2, '2012-01-02', 2) 
    , (3, '2012-01-03', 0) 
    , (4, '2012-01-04', 0) 
    , (5, '2012-01-05', 0) 
    , (6, '2012-01-06', 5) 
) a (a, b, c) 
) 
, r AS (
    SELECT StartDate = Date, EndDate = Date 
    FROM q 
    WHERE Allocation = 0 
    UNION ALL 
    SELECT r.StartDate, q.Date 
    FROM r 
      INNER JOIN q ON DATEDIFF(dd, r.EndDate, q.Date) = 1 
    WHERE q.Allocation = 0   
) 
SELECT s.StartDate, s.EndDate, DATEDIFF(dd, s.StartDate, s.EndDate) + 1 
FROM (
      SELECT * 
        , rn1 = ROW_NUMBER() OVER (PARTITION BY StartDate ORDER BY EndDate DESC) 
        , rn2 = ROW_NUMBER() OVER (PARTITION BY EndDate ORDER BY StartDate ASC) 
      FROM r   
     ) s 
WHERE s.rn1 = 1 
     AND s.rn2 = 1 
OPTION (MAXRECURSION 0) 
+0

@Istari之前,最大递归100已经用尽了 - 我已经推荐了一个maxrecursion选项来修复错误消息。 – 2012-02-14 12:27:47

1

与CTE但没有ROW_NUMBER()的替代方式,

的样本数据:

if object_id('tempdb..#tab') is not null 
    drop table #tab 

create table #tab (id int, date datetime, allocation int) 

insert into #tab 
select 1, '2012-01-01', 0 union 
select 2, '2012-01-02', 2 union 
select 3, '2012-01-03', 0 union 
select 4, '2012-01-04', 0 union 
select 5, '2012-01-05', 0 union 
select 6, '2012-01-06', 5 union 
select 7, '2012-01-07', 0 union 
select 8, '2012-01-08', 5 union 
select 9, '2012-01-09', 0 union 
select 10, '2012-01-10', 0 

查询:

;with cte(s_id, e_id, b_id) as (
    select s.id, e.id, b.id 
    from #tab s 
    left join #tab e on dateadd(dd, 1, s.date) = e.date and e.allocation = 0 
    left join #tab b on dateadd(dd, -1, s.date) = b.date and b.allocation = 0 
    where s.allocation = 0 
) 
select ts.date as [start date], te.date as [end date], count(*) as [day count] from (
    select c1.s_id as s, (
     select min(s_id) from cte c2 
     where c2.e_id is null and c2.s_id >= c1.s_id 
    ) as e 
    from cte c1 
    where b_id is null 
) t 
join #tab t1 on t1.id between t.s and t.e and t1.allocation = 0 
join #tab ts on ts.id = t.s 
join #tab te on te.id = t.e 
group by t.s, t.e, ts.date, te.date 

Live example at data.SE

1

采用该试样数据:

CREATE TABLE MyTable (ID INT, Date DATETIME, Allocation INT); 
INSERT INTO MyTable VALUES (1, {d '2012-01-01'}, 0); 
INSERT INTO MyTable VALUES (2, {d '2012-01-02'}, 2); 
INSERT INTO MyTable VALUES (3, {d '2012-01-03'}, 0); 
INSERT INTO MyTable VALUES (4, {d '2012-01-04'}, 0); 
INSERT INTO MyTable VALUES (5, {d '2012-01-05'}, 0); 
INSERT INTO MyTable VALUES (6, {d '2012-01-06'}, 5); 
GO 

尝试这种情况:

WITH DateGroups (ID, Date, Allocation, SeedID) AS (
    SELECT MyTable.ID, MyTable.Date, MyTable.Allocation, MyTable.ID 
     FROM MyTable 
     LEFT JOIN MyTable Prev ON Prev.Date = DATEADD(d, -1, MyTable.Date) 
          AND Prev.Allocation = 0 
    WHERE Prev.ID IS NULL 
     AND MyTable.Allocation = 0 
    UNION ALL 
    SELECT MyTable.ID, MyTable.Date, MyTable.Allocation, DateGroups.SeedID 
     FROM MyTable 
     JOIN DateGroups ON MyTable.Date = DATEADD(d, 1, DateGroups.Date) 
    WHERE MyTable.Allocation = 0 

), StartDates (ID, StartDate, DayCount) AS (
    SELECT SeedID, MIN(Date), COUNT(ID) 
     FROM DateGroups 
    GROUP BY SeedID 

), EndDates (ID, EndDate) AS (
    SELECT SeedID, MAX(Date) 
     FROM DateGroups 
    GROUP BY SeedID 

) 
SELECT StartDates.StartDate, EndDates.EndDate, StartDates.DayCount 
    FROM StartDates 
    JOIN EndDates ON StartDates.ID = EndDates.ID; 

查询的第一部分是一个递归SELECT,这是由是所有行锚定分配= 0,并且其前一天或者不存在或者分配!= 0.这实际上会返回ID:1和3,这是您想要返回的时间段的开始日期。

该查询的递归部分从锚点行开始,并查找也具有分配= 0的所有后续日期。SeedID通过所有迭代跟踪锚定的ID。

到目前为止的结果是这样的:

ID   Date     Allocation SeedID 
----------- ----------------------- ----------- ----------- 
1   2012-01-01 00:00:00.000 0   1 
3   2012-01-03 00:00:00.000 0   3 
4   2012-01-04 00:00:00.000 0   3 
5   2012-01-05 00:00:00.000 0   3 

下一个子查询使用简单GROUP BY过滤掉所有的开始日期为每个SeedID,并且还计算了天。

最后一个子查询与结束日期完成相同的事情,但是这次不需要日计数,因为我们已经有了这个。

最终的SELECT查询将这两者结合在一起组合起始日期和结束日期,并将它们与日计数一起返回。

1

试试看,如果它适合你 这里你的DATE的SDATE与你的表格保持一致。

SELECT SDATE, 
CASE WHEN (SELECT COUNT(*)-1 FROM TABLE1 WHERE ID BETWEEN TBL1.ID AND (SELECT MIN(ID) FROM TABLE1 WHERE ID > TBL1.ID AND ALLOCATION!=0)) >0 THEN(
CASE WHEN (SELECT SDATE FROM TABLE1 WHERE ID =(SELECT MAX(ID) FROM TABLE1 WHERE ID >TBL1.ID AND ID<(SELECT MIN(ID) FROM TABLE1 WHERE ID > TBL1.ID AND ALLOCATION!=0))) IS NULL THEN SDATE 
ELSE (SELECT SDATE FROM TABLE1 WHERE ID =(SELECT MAX(ID) FROM TABLE1 WHERE ID >TBL1.ID AND ID<(SELECT MIN(ID) FROM TABLE1 WHERE ID > TBL1.ID AND ALLOCATION!=0))) END 
)ELSE (SELECT SDATE FROM TABLE1 WHERE ID = (SELECT MAX(ID) FROM TABLE1 WHERE ID > TBL1.ID))END AS EDATE 
,CASE WHEN (SELECT COUNT(*)-1 FROM TABLE1 WHERE ID BETWEEN TBL1.ID AND (SELECT MIN(ID) FROM TABLE1 WHERE ID > TBL1.ID AND ALLOCATION!=0)) <0 THEN 
(SELECT COUNT(*) FROM TABLE1 WHERE ID BETWEEN TBL1.ID AND (SELECT MAX(ID) FROM TABLE1 WHERE ID > TBL1.ID)) ELSE 
(SELECT COUNT(*)-1 FROM TABLE1 WHERE ID BETWEEN TBL1.ID AND (SELECT MIN(ID) FROM TABLE1 WHERE ID > TBL1.ID AND ALLOCATION!=0)) END AS DAYCOUNT 
FROM TABLE1 TBL1 WHERE ALLOCATION = 0 
AND (((SELECT ALLOCATION FROM TABLE1 WHERE ID=(SELECT MAX(ID) FROM TABLE1 WHERE ID < TBL1.ID))<> 0) OR (SELECT MAX(ID) FROM TABLE1 WHERE ID < TBL1.ID)IS NULL);