2010-02-22 53 views
2

我有一个表列id和EmployeeID。表中的数据有如下特点:在某些地方(其中ID是连续的),相同的雇员有时可发现于,例如帮我看看数据块

Id | EmployeeID 
--------------- 
1 |  1 
2 |  1 
3 |  2 
4 |  5 
5 |  1 
6 |  1 

我想建立一个查询,以找到包含相同数据的块EmployeeID,其中Id是连续的(x记录的最小值)。到目前为止,我想出了:

SELECT EmployeeID, MIN(Id), MAX(Id), COUNT(*) 
FROM recs 
GROUP BY EmployeeID 
HAVING COUNT(*) > 5 AND 
     MAX(Id) - MIN(Id) + 1 = COUNT(*) 

这个查询会带给我一些数据(但不是全部)模块,只要在同一员工只能在一个块中找到。任何人都可以想出一个解决方案,为每个员工提供所有不同的数据块?

回答

1

不是最好的解决方案,但它应该工作(例如,3点连续的ID):

SELECT Id, EmployeeID FROM 
(
SELECT r.Id, r.EmployeeID, 
(SELECT COUNT(1) FROM recs r1 WHERE (r1.EmployeeID = r.EmployeeID AND r1.id = r.Id-1) AS c1, 
(SELECT COUNT(1) FROM recs r2 WHERE (r2.EmployeeID = r.EmployeeID AND r2.id = r.Id-2) AS c2, 
(SELECT COUNT(1) FROM recs r3 WHERE (r3.EmployeeID = r.EmployeeID AND r3.id = r.Id-3) AS c3 
FROM recs r1) tab1 
WHERE (tab1.c1+tab1.c2+tab1.c3 =3); 

我建议Id是主(或唯一的)键。如果不是这样,你应该将每个子查询改为SELECT IF(COUNT(1)> 0,1,0).....

2

加入到同一个表中table1.Id = table2.Id + 1和table1.employeeid = table2.employeeid

+0

这是第一步,但我仍然需要获得至少有5个连续ID的数据块。您的解决方案将获取所有连续的行。 – Anax 2010-02-23 00:47:40

0

为此使用临时表。使用此解决方案:

SELECT EmployeeID, MIN(Id) AS Min, MAX(Id) AS Max, COUNT(*) AS Count 
INTO #TempTable 
FROM recs 
GROUP BY EmployeeID 

SELECT * FROM #TempTable WHERE 
Count > 5 AND 
     Max - Min + 1 = Count 

EDITED ANSWER

请试试这个:

SELECT * FROM( 
SELECT EmployeeID, MIN(Id) AS min, MAX(Id) AS max, COUNT(*) AS count 
    FROM recs 
    GROUP BY EmployeeID) AS Table 
    WHERE Table.count > 5 AND 
      Table.max - Table.min + 1 = Table.count 
+0

我相信这将与我提供的查询完全一样。只有员工出现在一个块上时,它才会获取数据块。 – Anax 2010-02-23 08:04:20

+0

请参阅编辑答案。 – 2010-02-23 08:30:18

+0

这仍然行不通。尝试使用提供的数据集(将Table.count> 5替换为Table.count> = 2)以便自己查看。你仍然以同样的方式接近这个问题。 – Anax 2010-02-23 12:43:27

0

哇,这是一个真正的谜。我相信这有各种各样的漏洞,但这里有一个可能的解决方案。首先我们的测试数据:

If Exists(Select 1 From INFORMATION_SCHEMA.TABLES Where TABLE_NAME = 'recs') 
    DROP TABLE recs 
GO 
Create Table recs 
(
    Id int not null 
    , EmployeeId int not null 
) 
Insert recs(Id, EmployeeId) 
Values (1,1) ,(2,1) ,(3,1) ,(4,2) ,(5,5) ,(6,1) ,(7,1) ,(8,1) ,(10,1) 
    ,(11,1) ,(12,1) ,(13,2) ,(14,2) ,(15,2) ,(16,2) 

接下来,您将需要一个包含数字序列的Tally或Numbers表。我只在这个中放了500个元素,但考虑到您可能需要更多的数据大小。 Tally表中最大的数字应该大于recs表中的最大数字。

Create Table dbo.Tally(Num int not null) 
GO 
;With Numbers As 
    (
    Select ROW_NUMBER() OVER (ORDER BY s1.object_id) As Num 
    From sys.columns as s1 
    ) 
Insert dbo.Tally(Num) 
Select Num 
From Numbers 
Where Num < 500 

现在为实际的解决方案。基本上,我用一系列CTE来推断连续序列的开始和结束点。

; With 
    Employees As 
    (
    Select Distinct EmployeeId 
    From dbo.Recs 
    ) 
    , SequenceGaps As 
    (
    Select E.EmployeeId, T.Num, R1.Id 
    From dbo.Tally As T 
     Cross Join Employees As E 
     Left Join dbo.recs As R1 
      On R1.EmployeeId = E.EmployeeId 
       And R1.Id = T.Num 
    Where T.Num <= ( 
     Select Max(R3.Id) 
     From dbo.Recs As R3 
      Where R3.EmployeeId = E.EmployeeId 
      ) 
    ) 
    , EndIds As 
    (
    Select S.EmployeeId 
     , Case When S1.Id Is Null Then S.Id End As [End] 
    From SequenceGaps As S 
     Join SequenceGaps As S1 
      On S1.EmployeeId = S.EmployeeId 
       And S1.Num = (S.Num + 1) 
    Where S.Id Is Not Null 
     And S1.Id Is Null 
    Union All 
    Select S.EmployeeId, Max(Id) 
    From SequenceGaps As S 
    Where S.Id Is Not Null 
    Group By S.EmployeeId 
    ) 
    , SequencedEndIds As 
    (
    Select EmployeeId, [End] 
     , ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY [End]) As SequenceNum 
    From EndIds 
    ) 
    , StartIds As 
    (
    Select S.EmployeeId 
     , Case When S1.Id Is Null Then S.Id End As [Start] 
    From SequenceGaps As S 
     Join SequenceGaps As S1 
      On S1.EmployeeId = S.EmployeeId 
       And S1.Num = (S.Num - 1) 
    Where S.Id Is Not Null 
     And S1.Id Is Null 
    Union All 
    Select S.EmployeeId, 1 
    From SequenceGaps As S 
    Where S.Id = 1 
    ) 
    , SequencedStartIds As 
    (
    Select EmployeeId, [Start] 
     , ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY [Start]) As SequenceNum 
    From StartIds 
    ) 
    , SequenceRanges As 
    (
    Select S1.EmployeeId, Start, [End] 
    From SequencedStartIds As S1 
     Join SequencedEndIds As S2 
      On S2.EmployeeId = S1.EmployeeId 
       And S2.SequenceNum = S1.SequenceNum 
    ) 
Select * 
From SequenceGaps As SG 
Where Exists(
     Select 1 
     From SequenceRanges As SR 
     Where SR.EmployeeId = SG.EmployeeId 
      And SG.Id Between SR.Start And SR.[End] 
      And (SR.[End] - SR.[Start] + 1) >= @SequenceSize 
     ) 

WHERE子句和@SequenceSize在使用最后陈述时,你可以控制哪些返回序列。