ID <- c("ID300","ID301","ID302","ID303","ID304","ID305","ID306","ID307","ID308","ID309")
Measurement <- c("Length","Length","Length","Length","Length","Length","Length","Length","Length","Length")
PASSFAIL <- c("FAIL","PASS","FAIL","FAIL#Pts","PASS","PASS","PASS","PASS","PASS","FAIL")
df1 <- data.frame(ID,Measurement,PASSFAIL)
第一部分 我想创建计算每个ID的故障率列一个数据帧。我试图计算的方式是使用5个ID的窗口。例如
Fail Rate = (Number of Fails)/(Number of Fails + Number of Pass)
ID300 <- (Fails of Row1 to Row5)/(Total from Row1 to Row5) = (3/5) = 0.6
注:DF1,任何在通过失败列已经失败被认为是失败的。
还应该返回NA如果窗口大小小于5,因此我需要的输出看起来像这样
ID Measurement PASSFAIL FR
1 ID300 Length FAIL 0.6
2 ID301 Length PASS 0.4
3 ID302 Length FAIL 0.4
4 ID303 Length FAIL#Pts 0.2
5 ID304 Length PASS 0.0
6 ID305 Length PASS 0.2
7 ID306 Length PASS NA
8 ID307 Length PASS NA
9 ID308 Length PASS NA
10 ID309 Length FAIL NA
第2部分 一旦做到这一点,我需要重新计算每一个故障率添加新的ID考虑5.同一窗口例如,我期望这个输出是
ID Measurement PASSFAIL FR
1 ID296 Length PASS 0.4
2 ID297 Length FAIL 0.6
3 ID298 Length PASS 0.6
4 ID299 Length FAIL 0.6
5 ID300 Length FAIL 0.8
6 ID301 Length FAIL 0.6
7 ID302 Length PASS NA
8 ID303 Length FAIL NA
9 ID304 Length FAIL#Pts NA
10 ID305 Length PASS NA
我目前做这样的计算故障率,WHI ch为整个数据帧计算它。我不知道如何使用循环来计算顺序为每个ID考虑窗口大小5.
setDT(df1)
# aggregate
df1 <- df1[, .(FR = (sum(PASSFAIL != "PASS")/.N))]
请提供一些输入。
我建议你看看'zoo'包中的'filter'或'rollapply'。例如。 - 'filter(grepl(“FAIL”,df1 $ PASSFAIL),rep(1,5)/ 5,sides = 1)'还要注意有一个'by ='参数可以传递给'data.table'在由'by ='变量定义的组内运行函数。 – thelatemail