2012-08-16 140 views
1

我一直在努力解决一个实际上应该很简单的问题,但经过整整一周的阅读,搜索,实验等等,我的同事和我们找不到合适的解决方案。 :(岛屿和空白tsql

的问题:我们有两个值的表: 的employeenumber(P_ID,INT)< ---员工 日期的标识(开始时间,日期时间)< ---全职员工在

检查
  • 我们需要知道每个员工已经工作了什么阶段。
  • 当两个日期是小于@gap天外,他们属于同一时期
  • 对于每个员工可以有多个记录对于任何给定但我只需要知道哪些日期他工作,我对时间不感兴趣部分
  • 一旦有差距> @ gap天,下一个日期被认为是新范围的开始
  • 范围至少为1天(例如: 21-9-2011 | 21-09-2011)但没有最大长度。(每隔@gap检查一名员工 - 1天应该导致从他入住的第一天直到今天的一段时间)

我们认为我们需要的是这张表中的天数差距更大的岛屿比@variable(@gap = 30意味着30天)

所以一个例子:

sourceTable会

----- P_ID ---- | ---- starttime-- -
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00

现在我需要的结果是:

期全

---- P_ID ---- | ----启动---- | ---- ---- ---- ---- ---- ---- ----- ---- ----- 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (今天)OR 13-08-2012 < - (小于@gap天前)或(表中的最后一次日期)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 2012-06-01

我希望这是明确的这种方式,我已经感谢您阅读为止这一点,那将是巨大的,如果你能做出贡献!

+0

适用于上述结果集的'@ gap'的值是多少? – 2012-08-16 09:35:44

+0

你的结果集没有意义。你能解释一下12345的结果集条目吗? – 2012-08-16 10:08:40

+0

我不认为12345(应该是4行)或45454(应该是1行)的结果集是非常正确的。 – 2012-08-16 10:21:40

回答

0

乔恩最明确地告诉我们正确的方向。虽然性能很糟糕(数据库中有400万条记录)。看起来我们错过了一些信息。通过我们从您那里学到的所有知识,我们提出了以下解决方案。它使用所有建议答案的元素并在3个临时表中循环,然后最终喷出结果,但性能足够好,以及它生成的数据。

declare @gap int 
declare @Employee_id int 

set @gap = 30 
set dateformat dmy 
--------------------------------------------------------------- #temp1 -------------------------------------------------- 
CREATE TABLE #temp1 (EmployeeID int, starttime date) 
INSERT INTO #temp1 (EmployeeID, starttime) 

select distinct ck.Employee_id, 
       cast(ck.starttime as date) 
from SERVER1.DB1.dbo.checkins pd 
     inner join SERVER1.DB1.dbo.Team t on ck.team_id = t.id 
where t.productive = 1 

--------------------------------------------------------------- #temp2 -------------------------------------------------- 

create table #temp2 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, FIRSTCHECKIN datetime) 
INSERT INTO #temp2 

select Row_number() OVER (partition by EmployeeID ORDER BY t.prev) + 1 as ROWNR, 
      EmployeeID, 
      DATEADD(DAY, 1, t.Prev) AS start_gap, 
      DATEADD(DAY, 0, t.next) AS end_gap 
from 
      (
        select a.EmployeeID, 
            a.starttime as Prev, 
            (
            select min(b.starttime) 
            from #temp1 as b 
            where starttime > a.starttime and b.EmployeeID = a.EmployeeID 
           ) as Next 
from #temp1 as a) as t 

where datediff(day, prev, next) > 30 
group by  EmployeeID, 
        t.Prev, 
        t.next 
union -- add first known date for Employee 

select  1 as ROWNR, 
      EmployeeID, 
      NULL, 
      min(starttime) 
from #temp1 ct 
group by ct.EmployeeID 

--------------------------------------------------------------- #temp3 -------------------------------------------------- 

create table #temp3 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, STARTOFCHECKIN datetime) 
INSERT INTO #temp3 

select ROWNR, 
     Employeeid, 
     ENDOFCHECKIN, 
     FIRSTCHECKIN 
from #temp2 

union -- add last known date for Employee 

select  (select count(*) from #temp2 b where Employeeid = ct.Employeeid)+1 as ROWNR, 
      ct.Employeeid, 
      (select dateadd(d,1,max(starttime)) from #temp1 c where Employeeid = ct.Employeeid), 
      NULL 
from #temp2 ct 
group by ct.EmployeeID 

---------------------------------------finally check our data------------------------------------------------- 


select    a1.Employeeid, 
        a1.STARTOFCHECKIN as STARTOFCHECKIN, 
        ENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN a1.ENDOFCHECKIN ELSE b1.ENDOFCHECKIN END, 
        year(a1.STARTOFCHECKIN) as JaarSTARTOFCHECKIN, 
        JaarENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN year(a1.ENDOFCHECKIN) ELSE year(b1.ENDOFCHECKIN) END, 
        Month(a1.STARTOFCHECKIN) as MaandSTARTOFCHECKIN, 
        MaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN month(a1.ENDOFCHECKIN) ELSE month(b1.ENDOFCHECKIN) END, 
        (year(a1.STARTOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) as JaarMaandSTARTOFCHECKIN, 
        JaarMaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN (year(a1.ENDOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) ELSE (year(b1.ENDOFCHECKIN)*100)+month(b1.ENDOFCHECKIN) END, 
        datediff(M,a1.STARTOFCHECKIN,b1.ENDOFCHECKIN) as MONTHSCHECKEDIN 
from #temp3 a1 
     full outer join #temp3 b1 on a1.ROWNR = b1.ROWNR -1 and a1.Employeeid = b1.Employeeid 
where not (a1.STARTOFCHECKIN is null AND b1.ENDOFCHECKIN is null) 
order by a1.Employeeid, a1.STARTOFCHECKIN 
1

我已经做了一个粗略的脚本,应该让你开始。没有费心提炼日期时间,端点比较可能需要调整。

select 
    P_ID, 
    src.starttime, 
    endtime = case when src.starttime <> lst.starttime or lst.starttime < DATEADD(dd,-1 * @gap,GETDATE()) then lst.starttime else GETDATE() end, 
    frst.starttime, 
    lst.starttime 
from @SOURCETABLE src 
outer apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > DATEADD(dd,-1 * @gap,src.starttime)) frst 
outer apply (select starttime = MAX(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and src.starttime > DATEADD(dd,-1 * @gap,sub.starttime)) lst 
where src.starttime = frst.starttime 
order by P_ID, src.starttime 

我得到下面的输出,这是你的痘痘不同,但我认为它的确定:

P_ID  starttime    endtime     starttime    starttime 
----------- ----------------------- ----------------------- ----------------------- ----------------------- 
12121  2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 
12345  2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 
12345  2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 
12345  2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 
12345  2012-08-13 13:00:00.000 2012-08-16 11:23:25.787 2012-08-13 13:00:00.000 2012-08-13 13:00:00.000 
45454  2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 
98765  2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 

最后两个输出的cols是outer apply部分的结果,而只是那里进行调试。

这是基于以下设置:

declare @gap int 
set @gap = 30 

set dateformat dmy 
-----P_ID----|----starttime---- 
declare @SOURCETABLE table (P_ID int, starttime datetime) 
insert @SourceTable values 
(12121,'24-03-2009 7:30'), 
(12121,'24-03-2009 14:25'), 
(12345,'27-06-2011 10:00'), 
(12345,'27-06-2011 10:30'), 
(12345,'28-06-2011 11:00'), 
(98765,'13-04-2012 10:00'), 
(12345,'21-07-2011 9:00'), 
(12345,'21-09-2011 12:00'), 
(45454,'12-07-2010 8:00'), 
(12345,'21-09-2011 17:00'), 
(98765,'26-04-2012 16:00'), 
(12345,'07-07-2012 14:00'), 
(12345,'13-08-2012 13:00') 

UPDATE:轻微的反思。现在使用CTE从每个项目向前和向后工作存在的差距,然后汇总这些:

--Get the gap between each starttime and the next and prev (use 999 to indicate non-closed intervals) 
;WITH CTE_Gaps As ( 
    select 
     p_id, 
     src.starttime, 
     nextgap = coalesce(DATEDIFF(dd,src.starttime,nxt.starttime),999), --Gap to the next entry 
     prevgap = coalesce(DATEDIFF(dd,prv.starttime,src.starttime),999), --Gap to the previous entry 
     isold = case when DATEDIFF(dd,src.starttime,getdate()) > @gap then 1 else 0 end --Is starttime more than gap days ago? 
    from 
     @SOURCETABLE src 
     cross apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > src.starttime) nxt 
     cross apply (select starttime = max(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime < src.starttime) prv 
) 
--select * from CTE_Gaps 
select 
     p_id, 
     starttime = min(gap.starttime), 
     endtime = nxt.starttime 
    from 
     CTE_Gaps gap 
     --Find the next starttime where its gap to the next > @gap 
     cross apply (select starttime = MIN(sub.starttime) from CTE_Gaps sub where gap.p_id = sub.p_id and sub.starttime >= gap.starttime and sub.nextgap > @gap) nxt 
group by P_ID, nxt.starttime 
order by P_ID, nxt.starttime 
+0

乔恩,这个代码做我们正在寻找的东西,除了一件事情.....当没有gapps大于指定的时间段,我们会得到错误的结果。我会举一个例子:自2009年2月9日起,John一直与我们合作,并且从未离开过10天以上。当我们运行这个脚本时,他会显示一段时间:startdate是他第一次登录的那一天,enddate是(startdate + @gap),而不是他今天或最后一次登录的日期......所以当没有gapps比@gap大,显示的日期总是startdate,(startdate + @gap),而不是(startdate,今天)。如何补偿? – Henrov 2012-08-16 13:27:07

+1

将此用例包含在问题中的示例数据中,并显示所需的输出。 – 2012-08-16 13:27:44

+0

谢谢Jon!我们已经添加了P_ID 99999. – Henrov 2012-08-16 13:43:07