我有一个鱼标记信号的数据集,我想根据游泳速度计算不同行为的持续时间,例如,静态,巡航,爆发,所以我可以计算行为状态频率。我已经使用for
循环完成了这个操作,但是对于我的大型数据集来说这很慢。我确信这可以通过使用R的apply
函数来完成,但我无法弄清楚如何去做。根据连续因子实例求和列
这是我的数据是什么样子:
Period PEN SEC BLSEC BS BScount CountTF BSdur
380 7045 7 7 0.204 cruise 2 FALSE NA
381 7045 7 7 0.694 cruise 3 FALSE NA
382 7045 7 7 0.325 cruise 4 TRUE 21
383 7045 7 7 0.000 static 1 TRUE 7
384 7045 7 7 0.197 cruise 1 FALSE NA
385 7045 7 7 0.312 cruise 2 FALSE NA
386 7045 7 7 0.242 cruise 3 TRUE 21
387 7045 7 7 0.096 static 1 TRUE 7
388 7045 7 7 0.274 cruise 1 FALSE NA
389 7045 7 7 0.268 cruise 2 FALSE NA
390 7045 7 7 0.312 cruise 3 FALSE NA
391 7045 7 7 0.694 cruise 4 FALSE NA
392 7045 7 7 0.268 cruise 5 FALSE NA
证交会(!它并不总是7)的标签ping之间的秒数,BLSEC是每秒身长(由鱼即归一化距离游泳在标签坪之间)。我通过这样计算了BS,BScount和CountTF:
static = 0.1
cruise = 1
bsffile$BS <- ifelse(bsffile$BLSEC <= static, 'static', ifelse(bsffile$BLSEC > static & bsffile$BLSEC <= cruise, 'cruise', 'burst'))
bsffile$BScount <- sequence(rle(bsffile$BS)$lengths)
bsffile$CountTF <- c(ifelse(diff(bsffile$BScount, 1, 1) < 1, T, F), F)
BSdur是连续行为状态的SEC的总和。我使用它来计算它:
bssum <- 0
for (i in 1:nrow(bsffile)){
bssum <- bssum + bsffile[i, 'SEC']
if(bsffile[i, 'CountTF'] == T & is.na(bsffile[i, 'SEC']) == F){
bsffile[i,'BSdur'] <- bssum
bssum <- 0
} else {
bsffile[i,'BSdur'] <- NA
}
}
大概需要五分钟来运行我的数据集。任何建议如何使这个更快,例如使用apply
函数之一?
下面是一些dput
一起玩:
structure(list(Period = c(7045, 7045, 7045, 7045, 7045, 7045,
7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045, 7045,
7045, 7045, 7045, 7045), PEN = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("7", "8"), class = "factor"), SEC = c(7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 7, 7, 7, 7, 7), BLSEC = c(0.204,
0.694, 0.325, 0, 0.197, 0.312, 0.242, 0.096, 0.274, 0.268, 0.312,
0.694, 0.268, 0.541, 0.796, 0.306, 0.089, 0.93, 0.389, 0.452,
0.917), BS = c("cruise", "cruise", "cruise", "static", "cruise",
"cruise", "cruise", "static", "cruise", "cruise", "cruise", "cruise",
"cruise", "cruise", "cruise", "cruise", "static", "cruise", "cruise",
"cruise", "cruise"), BScount = c(2L, 3L, 4L, 1L, 1L, 2L, 3L,
1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 1L, 2L, 3L, 4L), CountTF = c(FALSE,
FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE,
TRUE), BSdur = c(NA, NA, 21, 7, NA, NA, 21, 7, NA, NA, NA, NA,
NA, NA, NA, 57, 7, NA, NA, NA, 28)), row.names = 380:400, .Names = c("Period",
"PEN", "SEC", "BLSEC", "BS", "BScount", "CountTF", "BSdur"
), class = "data.frame")
Akrun与
ave
做到这一点,谢谢您的回答。我无法让你的代码工作。所有的计算值是7. – Adamaki@Adamaki我正在使用你的'dput'输出,它给我的结果与我的文章中显示的一样 – akrun
我相信你是正确的。不知道为什么它不适用于我... – Adamaki