2017-05-26 79 views
0

我记录了旧系统中正在尝试转换为新系统的人员。在旧系统中,一个人最终可能会在同一位置出现多条记录。他们也可以从位置到另一个位置,然后返回到以前的位置。以下是一些示例数据:将连续的类似记录折叠为单个记录

PersonID | LocationID | StartDate | EndDate 
1   | 1   | 1980-07-30 | 2007-07-16 
1   | 1   | 2007-07-16 | 2008-01-30 
1   | 2   | 2008-01-30 | 2009-03-02 
1   | 2   | 2009-03-02 | 2009-11-06 
1   | 3   | 2014-07-16 | 2015-01-16 
1   | 1   | 2016-01-26 | 2999-12-31 

我想折叠此数据,以便获取任何连续LocationID的日期范围。对于上面的数据,这是我所期望的:

PersonID | LocationID | StartDate | EndDate 
1   | 1   | 1980-07-30 | 2008-01-30 
1   | 2   | 2008-01-30 | 2009-11-06 
1   | 3   | 2014-07-16 | 2015-01-16 
1   | 1   | 2016-01-26 | 2999-12-31 

我不确定如何做到这一点。我以前尝试加入前一个记录,但只有当连续两个位置,而不是3个或更多(可能有未定义数量的连续记录)时才起作用。

select 
    a.PersonID, 
    a.LocationID, 
    a.StartDate, 
    a.EndDate, 
    case when a.LocationID = b.LocationID then a.PK_ID else b.PK_ID end as NewID 
from employees a 
left outer join employees b 
on a.PersonID = b.PersonID 
and a.PK_ID = b.PK_ID - 1 

那么,我该如何编写一个查询来获得我需要的结果呢?

注:我们正在处理“2999年12月31日”是我们的“NULL”日期字段

回答

1

这是一个经典的差距和-群岛(编辑 - 纠正跨度较大2999)

Select [PersonID] 
     ,[LocationID] 
     ,[StartDate] = min(D) 
     ,[EndDate] = max(D) 
From (
     Select * 
       ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
     from YourTable A 
     Cross Apply (
         Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate]) 
         From master..spt_values n1,master..spt_values n2 
        ) B 
    ) G 
Group By [PersonID],[LocationID],Grp 
Order By [PersonID],min(D) 

返回

PersonID LocationID StartDate EndDate 
1   1   1980-07-30 2008-01-30 
1   2   2008-01-30 2009-11-06 
1   3   2014-07-16 2015-01-16 
1   1   2016-01-26 2999-12-31 

使用原始查询

Select [PersonID] 
     ,[LocationID] 
     ,[StartDate] = min(D) 
     ,[EndDate] = max(D) 
From (
     Select * 
       ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
     From (
       -- Your Original Query 
       select 
        a.PersonID, 
        a.LocationID, 
        a.StartDate, 
        a.EndDate, 
        case when a.LocationID = b.LocationID then a.PK_ID else b.PK_ID end as NewID 
       from employees a 
       left outer join employees b 
       on a.PersonID = b.PersonID 
       and a.PK_ID = b.PK_ID - 1 
      ) A 
     Cross Apply (
         Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate]) 
         From master..spt_values n1,master..spt_values n2 
        ) B 
    ) G 
Group By [PersonID],[LocationID],Grp 
Order By [PersonID],min(D) 

请求的评论

让我们来分解它的组件。

1)交叉应用部分:这会将单个记录展开为N条记录。例如:

Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date) 
Insert Into @YourTable Values 
(1,1,'1980-07-01','1980-07-03') 
,(1,1,'1980-07-02','1980-07-04') -- Notice the Overlap 
,(1,2,'2008-01-30','2008-02-05') 

Select * 
    from @YourTable A 
    Cross Apply (
       Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate]) 
       From master..spt_values n1,master..spt_values n2 
       ) B 

上面的查询将生成

enter image description here

2)GRP部:也许更容易,如果我提供了一个简单的例子:

Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date) 
Insert Into @YourTable Values 
(1,1,'1980-07-01','1980-07-03') 
,(1,1,'1980-07-02','1980-07-04') -- Notice the Overlap 
,(1,2,'2008-01-30','2008-02-05') 

Select * 
     ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
     ,RN1 = Row_Number() over (Order By D) 
     ,RN2 = Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
    from @YourTable A 
    Cross Apply (
       Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate]) 
       From master..spt_values n1,master..spt_values n2 
       ) B 

上面查询生成:

enter image description here

RN1和RN2是GRP的突破,只是为了说明机制。注意RN1减去RN2等于GRP。一旦我们有了GRP,它成为聚集通过一组由

3)一个简单的事情拉一起:

Declare @YourTable Table ([PersonID] int,[LocationID] int,[StartDate] date,[EndDate] date) 
Insert Into @YourTable Values 
(1,1,'1980-07-01','1980-07-03') 
,(1,1,'1980-07-02','1980-07-04') -- Notice the Overlap 
,(1,2,'2008-01-30','2008-02-05') 

Select [PersonID] 
     ,[LocationID] 
     ,[StartDate] = min(D) 
     ,[EndDate] = max(D) 
From (
     Select * 
       ,Grp = Row_Number() over (Order By D) - Row_Number() over (Partition By [PersonID],[LocationID] Order By D) 
      from @YourTable A 
      Cross Apply (
         Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate]) 
         From master..spt_values n1,master..spt_values n2 
         ) B 
    ) G 
Group By [PersonID],[LocationID],Grp 
Order By [PersonID],min(D) 

返回

enter image description here

+0

这解决了我的问题!你能解释这是如何工作的吗? – DForck42

+0

@ DForck42我会在一会儿添加一些评论。 –

1

为了您的样本数据,你可以使用的行数做法上的不同:

select personid, locationid, min(startdate), max(enddate) 
from (select e.*, 
      row_number() over (partition by personid order by startdate) as seqnum_p, 
      row_number() over (partition by personid, locationid order by startdate) as seqnum_pl 
     from employees e 
    ) e 
group by (seqnum_p - seqnum_pl), personid, locationid; 

这假设开始和结束日期是连续的。也就是说,在同一地点的特定员工没有差距。

+0

速度远远超过我回答。刚纠正了两个错别字+1 –