2017-04-06 41 views
0

我有一个鱼标记信号的数据集,我想根据游泳速度计算不同行为的持续时间,例如,静态,巡航,爆发,所以我可以计算行为状态频率。我已经使用for循环完成了这个操作,但是对于我的大型数据集来说这很慢。我确信这可以通过使用R的apply函数来完成,但我无法弄清楚如何去做。根据连续因子实例求和列

这是我的数据是什么样子:

Period PEN SEC BLSEC  BS BScount CountTF BSdur 
380 7045 7 7 0.204 cruise  2 FALSE  NA 
381 7045 7 7 0.694 cruise  3 FALSE  NA 
382 7045 7 7 0.325 cruise  4 TRUE  21 
383 7045 7 7 0.000 static  1 TRUE  7 
384 7045 7 7 0.197 cruise  1 FALSE  NA 
385 7045 7 7 0.312 cruise  2 FALSE  NA 
386 7045 7 7 0.242 cruise  3 TRUE  21 
387 7045 7 7 0.096 static  1 TRUE  7 
388 7045 7 7 0.274 cruise  1 FALSE  NA 
389 7045 7 7 0.268 cruise  2 FALSE  NA 
390 7045 7 7 0.312 cruise  3 FALSE  NA 
391 7045 7 7 0.694 cruise  4 FALSE  NA 
392 7045 7 7 0.268 cruise  5 FALSE  NA 

证交会(!它并不总是7)的标签ping之间的秒数,BLSEC是每秒身长(由鱼即归一化距离游泳在标签坪之间)。我通过这样计算了BS,BScount和CountTF:

static = 0.1 
cruise = 1 

bsffile$BS <- ifelse(bsffile$BLSEC <= static, 'static', ifelse(bsffile$BLSEC > static & bsffile$BLSEC <= cruise, 'cruise', 'burst')) 
bsffile$BScount <- sequence(rle(bsffile$BS)$lengths) 
bsffile$CountTF <- c(ifelse(diff(bsffile$BScount, 1, 1) < 1, T, F), F) 

BSdur是连续行为状态的SEC的总和。我使用它来计算它:

bssum <- 0 

for (i in 1:nrow(bsffile)){ 
    bssum <- bssum + bsffile[i, 'SEC'] 
    if(bsffile[i, 'CountTF'] == T & is.na(bsffile[i, 'SEC']) == F){ 
    bsffile[i,'BSdur'] <- bssum 
    bssum <- 0 
    } else { 
    bsffile[i,'BSdur'] <- NA  
    } 
} 

大概需要五分钟来运行我的数据集。任何建议如何使这个更快,例如使用apply函数之一?

下面是一些dput一起玩:

structure(list(Period = c(7045, 7045, 7045, 7045, 7045, 7045, 
7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 
7045, 7045, 7045, 7045), PEN = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L 
), .Label = c("7", "8"), class = "factor"), SEC = c(7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 7, 7, 7, 7, 7), BLSEC = c(0.204, 
0.694, 0.325, 0, 0.197, 0.312, 0.242, 0.096, 0.274, 0.268, 0.312, 
0.694, 0.268, 0.541, 0.796, 0.306, 0.089, 0.93, 0.389, 0.452, 
0.917), BS = c("cruise", "cruise", "cruise", "static", "cruise", 
"cruise", "cruise", "static", "cruise", "cruise", "cruise", "cruise", 
"cruise", "cruise", "cruise", "cruise", "static", "cruise", "cruise", 
"cruise", "cruise"), BScount = c(2L, 3L, 4L, 1L, 1L, 2L, 3L, 
1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 1L, 2L, 3L, 4L), CountTF = c(FALSE, 
FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, 
TRUE), BSdur = c(NA, NA, 21, 7, NA, NA, 21, 7, NA, NA, NA, NA, 
NA, NA, NA, 57, 7, NA, NA, NA, 28)), row.names = 380:400, .Names = c("Period", 
"PEN", "SEC", "BLSEC", "BS", "BScount", "CountTF", "BSdur" 
), class = "data.frame") 

回答

2

易与data.table

library(data.table) 
setDT(bsffile) 
bsffile[,BSdur:=ifelse(CountTF==T,sum(SEC),0),by=.(rleid(BS))] 
0

我们可以从base R

df1$BSdur <- with(df1, ave(SEC, cumsum(c(TRUE, BS[-1]!= BS[-nrow(df1)])), FUN = sum)*CountTF) 
df1$BSdur 
#[1] 0 0 21 7 0 0 21 7 0 0 0 0 0 0 0 57 7 0 0 0 28 
+0

Akrun与ave做到这一点,谢谢您的回答。我无法让你的代码工作。所有的计算值是7. – Adamaki

+0

@Adamaki我正在使用你的'dput'输出,它给我的结果与我的文章中显示的一样 – akrun

+0

我相信你是正确的。不知道为什么它不适用于我... – Adamaki