2017-04-12 97 views
1

我的数据集如何计算条件的持续时间(分钟)?

我的数据集包括很多人(ID)在本周(Day)的各种天不同的区域(Location)工作的开始和结束时间。我的数据集的下面是一个例子:

> head(WeekOne, 15) 
       Start    Finish Day  ID Location 
1 2017-04-12 00:00:00 2017-04-12 00:02:55 D1 Daniel Office 
2 2017-04-12 00:02:55 2017-04-12 00:06:18 D1 Daniel Office 
3 2017-04-12 00:06:18 2017-04-12 00:08:20 D1 Daniel OnSite 
4 2017-04-12 00:08:20 2017-04-12 00:08:40 D1 Daniel OnSite 
5 2017-04-12 00:08:40 2017-04-12 00:10:11 D1 Daniel Travel 
6 2017-04-12 00:10:11 2017-04-12 00:10:18 D1 Daniel Travel 
7 2017-04-12 00:10:18 2017-04-12 00:17:52 D1 Daniel Travel 
8 2017-04-12 00:17:52 2017-04-12 00:19:00 D1 Daniel Travel 
9 2017-04-12 00:19:00 2017-04-12 00:19:56 D1 Daniel OnSite 
10 2017-04-12 00:19:56 2017-04-12 00:28:48 D1 Daniel OnSite 
11 2017-04-12 00:00:00 2017-04-12 00:03:52 D2 Daniel OnSite 
12 2017-04-12 00:03:52 2017-04-12 00:04:05 D2 Daniel Office 
13 2017-04-12 00:04:05 2017-04-12 00:08:32 D2 Daniel Office 
14 2017-04-12 00:08:32 2017-04-12 00:16:01 D2 Daniel Travel 
15 2017-04-12 00:16:01 2017-04-12 00:25:35 D2 Daniel OnSite 

我想知道的总时间,以分钟为单位,每个ID在一周花费在每个LocationDay的最大级别是D7,每个星期我都有一个独立的data.frame。因此,我只需要遍历LocationID

我有什么企图

下面的代码,虽然这将返回分钟在一个陌生的格式,并没有考虑多次访问同一位置上一天。例如,Daniel在D1上访问OnSite两次。

WeekOne %>% 
    group_by(ID, Location) %>% 
    summarise(Duration = max(Finish) - min(Start)) 

我没想到创建占多和变化Location新列WeekOne$Level的。然后我可以迭代每个Level并使用上面的代码。例如:

> head(WeekOne, 15) 
       Start    Finish Day  ID Location Level 
1 2017-04-12 00:00:00 2017-04-12 00:02:55 D1 Daniel Office 1 
2 2017-04-12 00:02:55 2017-04-12 00:06:18 D1 Daniel Office 1 
3 2017-04-12 00:06:18 2017-04-12 00:08:20 D1 Daniel OnSite 2 
4 2017-04-12 00:08:20 2017-04-12 00:08:40 D1 Daniel OnSite 2 
5 2017-04-12 00:08:40 2017-04-12 00:10:11 D1 Daniel Travel 3 
6 2017-04-12 00:10:11 2017-04-12 00:10:18 D1 Daniel Travel 3 
7 2017-04-12 00:10:18 2017-04-12 00:17:52 D1 Daniel Travel 3 
8 2017-04-12 00:17:52 2017-04-12 00:19:00 D1 Daniel Travel 3 
9 2017-04-12 00:19:00 2017-04-12 00:19:56 D1 Daniel OnSite 4 
10 2017-04-12 00:19:56 2017-04-12 00:28:48 D1 Daniel OnSite 4 
11 2017-04-12 00:00:00 2017-04-12 00:03:52 D2 Daniel OnSite 5 
12 2017-04-12 00:03:52 2017-04-12 00:04:05 D2 Daniel Office 6 
13 2017-04-12 00:04:05 2017-04-12 00:08:32 D2 Daniel Office 6 
14 2017-04-12 00:08:32 2017-04-12 00:16:01 D2 Daniel Travel 7 
15 2017-04-12 00:16:01 2017-04-12 00:25:35 D2 Daniel OnSite 8 

WeekOne %>% 
    group_by(ID, Level) %>% 
    summarise(Duration = max(Finish) - min(Start)) 

不过,我不确定如何,即使在添加此列,它不占Location,看起来繁琐,不分钟,一个有趣的格式返回解决这个问题。

我的问题

我怎么能快速,轻松地计算出Location每个ID随时间的总时长?我希望持续时间在几分钟内,四舍五入到最接近的分钟。例如:3分钟。

回答

1

你首先要计算时间,然后通过ID和位置得到的总和:

WeekOne %>% 
     mutate(Duration = Finish - Start) %>% 
     group_by(ID, Location) %>% 
     summarize(Total_Duration = round(sum(Duration)/60, 1)) 
+0

这是什么'Total_Duration'的格式?例如,我给出了一个数字933.50000238419,但是如何在几分钟内获得'Total_Duration'? – user2716568

+0

看起来你在秒atm,所以只需除以60分钟即可获得分钟数 –