2015-09-14 61 views
2

我有一个数据帧ř使用data.table切割含有2或更多个变量

df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45", 
"2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"), 
inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT")) 

> df 
        time inOut 
1 2015-09-07 00:32:19 IN 
2 2015-09-07 01:02:30 OUT 
3 2015-09-07 01:31:36 IN 
4 2015-09-07 01:47:45 IN 
5 2015-09-07 02:00:17 IN 
6 2015-09-07 02:07:30 IN 
7 2015-09-07 03:39:41 IN 
8 2015-09-07 04:04:21 OUT 
9 2015-09-07 04:04:21 IN 
10 2015-09-07 04:04:22 OUT 
> 

我想要计算计数IN/OUT每15分钟的数量, 我可以做修复时间间隔这通过创建另一个in_df,out_df,每15分钟剪切一次这些数据帧,然后将它合并到一起以获得我的结果。 outdf是我的预期结果。

in_df <- df[which(df$inOut== "IN"),] 
out_df <- df[which(df$inOut== "OUT"),] 

a <- data.frame(table(cut(as.POSIXct(in_df$time), breaks="15 mins"))) 
b <- data.frame(table(cut(as.POSIXct(out_df$time), breaks="15 mins"))) 
colnames(b) <- c("Time", "Out") 
colnames(a) <- c("Time", "In") 

outdf <- merge(a,b, all=TRUE) 
outdf[is.na(outdf)] <- 0 

> outdf 
        Time In Out 
1 2015-09-07 00:32:00 1 0 
2 2015-09-07 00:47:00 0 0 
3 2015-09-07 01:02:00 0 1 
4 2015-09-07 01:17:00 1 0 
5 2015-09-07 01:32:00 0 0 
6 2015-09-07 01:47:00 2 0 
7 2015-09-07 02:02:00 1 0 
8 2015-09-07 02:17:00 0 0 
9 2015-09-07 02:32:00 0 0 
10 2015-09-07 02:47:00 0 0 
11 2015-09-07 03:02:00 0 0 
12 2015-09-07 03:17:00 0 0 
13 2015-09-07 03:32:00 1 0 
14 2015-09-07 03:47:00 0 0 
15 2015-09-07 04:02:00 1 2 

我的问题是如何用data.table做到这一点,以获得相同的结果?

回答

6

在data.table,我会做

library(data.table) 
setDT(df) 

df[, timeCut := cut(as.POSIXct(time), breaks="15 mins")] 

df[J(timeCut = levels(timeCut)), 
    as.list(table(inOut)), 
    on = "timeCut", 
    by = .EACHI] 

这给:

   timeCut IN OUT 
1: 2015-09-07 00:32:00 1 0 
2: 2015-09-07 00:47:00 0 0 
3: 2015-09-07 01:02:00 0 1 
4: 2015-09-07 01:17:00 1 0 
5: 2015-09-07 01:32:00 0 0 
6: 2015-09-07 01:47:00 2 0 
7: 2015-09-07 02:02:00 1 0 
8: 2015-09-07 02:17:00 0 0 
9: 2015-09-07 02:32:00 0 0 
10: 2015-09-07 02:47:00 0 0 
11: 2015-09-07 03:02:00 0 0 
12: 2015-09-07 03:17:00 0 0 
13: 2015-09-07 03:32:00 1 0 
14: 2015-09-07 03:47:00 0 0 
15: 2015-09-07 04:02:00 1 2 

说明最后一部分是像DT[i=J(x=my_x), j, on="x", by=.EACHI],可以理解为:

  1. 加入DTx on my_x
  2. 然后对由my_x确定的每个子集执行j

在这种情况下,j=as.list(table(inOut))。该表必须被强制为一个列表来创建多个列(每个级别为inOut)。

+1

用'.EACHI'好方法# – akrun

+1

@Frank,谢谢,你的data.table sol非常好,很清楚,我将这个标记为答案,并为“dplyr”Sol创建另一个问题。 –

+0

@JamesChen好吧,够公平的。我也有兴趣看看那些人也有这个想法。我不知道dplyr如何从'table'结果创建多个列。 – Frank