我有以下数据框:自动计算汇总统计的数据帧和创建新表
col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
"chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
"low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)
test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)
,我想是这样的表落得(值是随机的):
Species Pop.density %Resistance CI_low CI_high Total samples
avi low 2.0 1.2 2.2 30
avi med 0 0 0.5 20
avi high 3.5 2.9 4.2 10
chi low 0.5 0.3 0.7 20
chi med 2.0 1.9 2.1 150
chi high 6.5 6.2 6.6 175
%阻力栏基于上面的col3,其中1 =耐,0 =不耐。我曾尝试以下:
library(dplyr)
test_data<-test_data %>%
count(col1,col2,col3) %>%
group_by(col1, col2) %>%
mutate(perc_res = prop.table(n)*100)
我想这一点,它似乎几乎做的伎俩,因为我得到的总的1和0的COL3的百分比,在col1和2每一个值,但总样本是错误的,因为我指望所有的三列,当正确的计数将是唯一的col1和2
对于置信区间我会用以下内容:
binom.test(resistant samples,total samples)$conf.int*100
但是我不知道怎么样与其他人一起实施。 有没有简单快捷的方法来做到这一点?
我建议使用group_by然后使用汇总功能。 – Jul
使用'data.frame(col1,col2,col3)',而不是'cbind',这会迫使每列在这里串起来。 – Frank
您的示例数据没有(“avi”,“high”)对。您是否希望该行反正出现(使用NAs和零采样数)? – Frank