在很多场合我做了类似的事情。实质上,基于复杂排序内的分离进行分组。我使用的方法的基础知识如下:
- 构建一个所有感兴趣的时间范围表。
- 找到每组感兴趣的时间范围的开始时间。
- 找出每组感兴趣的时间范围的结束时间。
- 将开始和结束时间加入到时间范围和组列表中。
或者,更详细:(每一个步骤可能是一个大的CTE的一部分,但我已经打破它分解成易于阅读的临时表...)
第1步:找到感兴趣的所有时间范围的列表(我使用了与@Brad链接的方法类似的方法)。 注意:正如@Manfred Sorg所指出的那样,这假定公交车的数据中没有“缺失的秒数”。如果时间戳中有中断,则此代码将将单个范围解释为两个(或更多)不同的范围。
;with stopSeconds as (
select BusID, BusStopID, TimeStamp,
[date] = cast(datediff(dd,0,TimeStamp) as datetime),
[grp] = dateadd(ss, -row_number() over(partition by BusID order by TimeStamp), TimeStamp)
from #test
where BusStopID is not null
)
select BusID, BusStopID, date,
[sTime] = dateadd(ss,datediff(ss,date,min(TimeStamp)), 0),
[eTime] = dateadd(ss,datediff(ss,date,max(TimeStamp)), 0),
[secondsOfStop] = datediff(ss, min(TimeStamp), max(Timestamp)),
[sOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,min(TimeStamp))),
[eOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,max(TimeStamp)))
into #ranges
from stopSeconds
group by BusID, BusStopID, date, grp
第2步:找到的最早时间为每个停止
select this.BusID, this.BusStopID, this.sTime minSTime,
[stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.sTime)
into #starts
from #ranges this
left join #ranges prev on this.BusID = prev.BusID
and this.BusStopID = prev.BusStopID
and this.sOrd = prev.sOrd+1
and this.sTime between dateadd(mi,-10,prev.sTime) and dateadd(mi,10,prev.sTime)
where prev.BusID is null
第3步:查找每个最晚时间停止
select this.BusID, this.BusStopID, this.eTime maxETime,
[stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.eTime)
into #ends
from #ranges this
left join #ranges next on this.BusID = next.BusID
and this.BusStopID = next.BusStopID
and this.eOrd = next.eOrd-1
and this.eTime between dateadd(mi,-10,next.eTime) and dateadd(mi,10,next.eTime)
where next.BusID is null
第4步:一起加入一切
select r.BusID, r.BusStopID,
[avgLengthOfStop] = avg(datediff(ss,r.sTime,r.eTime)),
[earliestStop] = min(r.sTime),
[latestDepart] = max(r.eTime)
from #starts s
join #ends e on s.BusID=e.BusID
and s.BusStopID=e.BusStopID
and s.stopOrder=e.stopOrder
join #ranges r on r.BusID=s.BusID
and r.BusStopID=s.BusStopID
and r.sTime between s.minSTime and e.maxETime
and r.eTime between s.minSTime and e.maxETime
group by r.BusID, r.BusStopID, s.stopOrder
having count(distinct r.date) > 1 --filters out the "noise"
最后,是完整的,收拾:
drop table #ends
drop table #starts
drop table #ranges
是'巴士ID`真的应该增加所有时间戳?另外,由于`Timestamp`实际上是SQL中的一种数据类型,我建议不要将它用作列名,但我知道您选择了一个有意义的名称(与数据类型本身的名称不同)。 – Brad 2010-12-07 13:26:20
哎呀,我在StackOverflow中输入模式时犯了一个错误。你是对的,面包屑ID增加,BusID是FK。 – 2010-12-07 13:27:20
有趣的问题,比看起来更复杂 – smirkingman 2010-12-08 13:05:20